Re: [R] Assigning variable value as name to cbind column
Why does the naming have to be done inside the cbind()? How about dataTest - data.frame(col1 = c(1,2,3)) new.data - c(1,2) name - test length(new.data) - nrow(dataTest) newDataTest - cbind(dataTest, new.data) names(newDataTest)[[ncol(newDataTest)]] - name newDataTest col1 test 111 222 33 NA ? -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ralf B Sent: Friday, 25 June 2010 3:48 PM To: r-help@r-project.org Subject: [R] Assigning variable value as name to cbind column Hi all, I have this (non-working) script: dataTest - data.frame(col1=c(1,2,3)) new.data - c(1,2) name - test n.row - dim(dataTest)[1] length(new.data) - n.row names(new.data) - name cbind(dataTest, name=new.data) print(dataTest) and would like to bind the new column 'new.data' to 'dataTest' by using the value of the variable 'name' as the column name. The end result should look like this: col1 test 1 1 1 2 2 2 3 3 NA The best I got was that 'name' became the column name but never the actual value of 'name'. How can i do that? (This is actually a function that runs many time -- this means a manual workaround is not feasible). Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] installing multicore package
Sir, I want to apply mclapply() function for my analysis. So, I have to install multicore package. But I can not install the package. install.packages(multicore) It gives that package multicore is not available. Can you help me? Regards, Suman Dhara [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] best way to plot a evolution in time
Hi everyone, I have the following question: given three objects let's say: a - c( 2 , 5, 15, 16) b - c(1 ,1, 8 , 8) c - c (10, 10 11 ,11) m-matrix(c(a,b,c),byrow=T,nrow=3) rownames(m)-c(gene a, 'gene b','gene c') m gene.dist-dist(m,method='euclidian') gene.dist which is the best way to plot their evolution in time? shoul I use a levelplot or just a normal plot? if I use a normal plot how do I plot evolution in time? -- View this message in context: http://r.789695.n4.nabble.com/best-way-to-plot-a-evolution-in-time-tp2267993p2267993.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] i want create script
Hi R community, I want to create a script which will take the .csv table as input and do some prediction and output should be returned to some file.Inputs is exel sheet containing some tables of data.out should be table of predicted data.Will some one help me in this regards... Thanks in advance. I am using Windows R.Please advise proccedure to create Rscript. Regards - Vijay Research student Bangalore India -- View this message in context: http://r.789695.n4.nabble.com/i-want-create-script-tp2268011p2268011.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?
In fact, Euclidean Distance Matrix Analysis (EDMA) of form is a coordinate free approach to the analysis of form using landmark data which was developed by Subhash Lele and Joan Richstmeier. They also developed a computer program (http://www.getahead.psu.edu/comment/edma.asp) that allow to perform several techniques including EDMA I-II but I wonder is there a package or code available in R to perform EDMA... thanks -- View this message in context: http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268018.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
I add some scientific references for Google Insights for Search: * Google Predicting the Present http://www.google.com/googleblogs/pdfs/google_predicting_the_present.pdf * Google Econometrics and Unemployment Forecasting http://ftp.iza.org/dp4201.pdf * Query Indices and a 2008 Downturn: Israeli Data http://www.bankisrael.gov.il/deptdata/mehkar/papers/dp0906e.pdf If it is considered a useful tool, and I think it is, it is important to reflect on keywords to use. Second, is interesting to see these trends in comparison with other indicators such as number of users/posts in mailing list, Google Scholar Citations, etc... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Optimizing given two vectors of data
I am trying to estimate an Arrhenius-exponential model in R. I have one vector of data containing failure times, and another containing corresponding temperatures. I am trying to optimize a maximum likelihood function given BOTH these vectors. However, the optim command takes only one such vector parameter. How can I pass both vectors into the function? -- View this message in context: http://r.789695.n4.nabble.com/Optimizing-given-two-vectors-of-data-tp2268002p2268002.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
BTW. If there is not so weak test that would be suitable for my purpose (because of the ties and the shape of the data), could I proceed this way: It is also worth of comparing different samples taken from the data. Since the mean and sd of the data are available, could I approximate p-values using z- or t-test, just to compare several different samples? Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused: Looping in dataframes
Hey, I have a data frame x which consists of say 10 vectors. I essentially want to find out the best fit exponential smoothing for each of the vectors. The problem while I'm getting results when i say lapply(x,ets) I am getting an error when I say myprint function(x) { for(i in 1:length(x)) { ets(x[i],model=AZZ,opt.crit=c(amse)) } } The error message is that* Error in ets(x[i], model = AZZ, opt.crit = c(amse)) : y should be a univariate time series* Could someone please explain why this is happening? I also want to be able to extract data like coef's, errors (MAPE,MSE etc.) Thanks and regards, Phani -- A. Phani Kishan 3rd Year B.Tech Dept. of Computer Science Engineering IIT MADRAS Ph: +919962363545 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused: Looping in dataframes
On 06/25/2010 10:02 AM, phani kishan wrote: Hey, I have a data frame x which consists of say 10 vectors. I essentially want to find out the best fit exponential smoothing for each of the vectors. The problem while I'm getting results when i say lapply(x,ets) I am getting an error when I say myprint function(x) { for(i in 1:length(x)) { ets(x[i],model=AZZ,opt.crit=c(amse)) Hi, Please provide a reproducible example, as stated in the posting guide. My guess is that replacing x[i] by x[[i]] would solve the problem. Double brackets return a vector in stead of a data.frame that has just column i. cheers, Paul } } The error message is that* Error in ets(x[i], model = AZZ, opt.crit = c(amse)) : y should be a univariate time series* Could someone please explain why this is happening? I also want to be able to extract data like coef's, errors (MAPE,MSE etc.) Thanks and regards, Phani -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 253 5773 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweave: The opposite of tangle
Hi, I am using Sweave to write an article. If I want to convert the *.rnw to a *.tex file I have to run Sweave which might take a long time. Is there away to get a tex-file as result without (evaluating) the R-chunks, i.e. the opposite of tangle (that just gives R-chunk). Thanks, Stefan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BBH2 and FrF2 packages
Hi: MEPlot, IAPlot and cubePlot come from the FrF2 package; the DanielPlot function is in both package BsMD and FrF2. Try library(FrF2) and then run your code again; it worked for me... If you check the list of functions in BHH2 under HTML help, you'll find that none of the plot functions you used below are found in that package, but they are all found under FrF2. HTH, Dennis On Thu, Jun 24, 2010 at 4:17 PM, Andrea Bernasconi DG andrea.bernasconi...@gmail.com wrote: Hi R HELP, I consider the 2^3 factorial experiment described at page 177 of the book Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter (BHH2). This example use the following data in file BHH2-Data/tab0502.dat at ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip run T C K y 1 1 -1 -1 -1 60 2 2 1 -1 -1 72 3 3 -1 1 -1 54 4 4 1 1 -1 68 5 5 -1 -1 1 52 6 6 1 -1 1 83 7 7 -1 1 1 45 8 8 1 1 1 80 Using these data and the R BHH2 package, I was not able to reproduce the very simple results in the BHH2 book. In particular, the following solution will have no meaning since K is categorical: ( plan - lm(y ~ (T+C+K)^2, data = DATA) ) MEPlot(plan) # Main Effects IAPlot(plan) # Interactions Effects DanielPlot(plan) cubePlot(plan, T, C, K) I decided to rebuilt the data using: plan - FrF2(8, 3, factor.names=c(T,C,K), default.level=c(-,+), randomize = FALSE) ( plan - add.response(plan, y) ) giving: T C K y 1 - - - 60 2 + - - 72 3 - + - 54 4 + + - 68 5 - - + 52 6 + - + 83 7 - + + 45 8 + + + 80 class=design, type= full factorial Unfortunately the following plot commands do not work: MEPlot(plan) IAPlot(plan) DanielPlot(plan) The error is: Error in MEPlot.design(plan) : The design obj must be of a type containing FrF2 or pb. Why? If I add a fake factor to the plan the plot commands work, but the solution will have no meaning: plan - FrF2(8, 4, factor.names=c(T,C,K,Q), default.level=c(-,+), randomize = FALSE) ( plan - add.response(plan, y) ) MEPlot(plan) IAPlot(plan) DanielPlot(plan) Sincerely, Andrea B. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] HEGY.test, error Mypi not found
Hi, I'd like to use the HEGY test from the uroot package (s. attachment) and get the following error message: error in dimnames(Mypi)[[2]] - paste(Ypi, 1:s, sep = ) : Object 'Mypi' not found For the air passenger example on http://127.0.0.1:11997/library/uroot/html/HEGY.test.html it works, but for my time series it doesn't (giving names to the columns and rows did not help either...). Thanks for your help. Jessica __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple qqplot question
Sorry, missed the two variable thing. Go with the lm solution then, and you can tweak the plot yourself (the confidence intervals are easily obtained via predict(lm.object, interval=prediction) ). The function qq.plot uses robust regression, but in your case normal regression will do. Regarding the shapes : this just indicates both tails are shorter than expected, so you have a kurtosis greater than 3 (or positive, depending whether you do the correction or not) Cheers Joris On Fri, Jun 25, 2010 at 4:10 AM, Ralf B ralf.bie...@gmail.com wrote: Short rep: I have two distributions, data and data2; each build from about 3 million data points; they appear similar when looking at densities and histograms. I plotted qqplots for further eye-balling: qqplot(data, data2, xlab = 1, ylab = 2) and get an almost perfect diagonal line which means they are in fact very alike. Now I tried to check normality using qqnorm -- and I think I am doing something wrong here: qqnorm(data, main = Q-Q normality plot for 1) qqnorm(data2, main = Q-Q normality plot for 2) I am getting perfect S-shaped curves (??) for both distributions. Am I something missing here? | | * * * * | * | * | * | * | * | * | * * * |- Thanks, Ralf -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
As a remark on your histogram : use less breaks! This histogram tells you nothing. An interesting function is ?density , eg : x-rnorm(250) hist(x,freq=F) lines(density(x),col=red) See also this ppt, a very nice and short introduction to graphics in R : http://csg.sph.umich.edu/docs/R/graphics-1.pdf 2010/6/25 Atte Tenkanen atte...@utu.fi: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 You should think about the central limit theorem. Actually, you can just use a t-test to compare means, as with those sample sizes the mean is almost certainly normally distributed. i would like to test, whether the mean of the sample differ significantly from the population mean. According to probability theory, this will be in 5% of the cases if you repeat your sampling infinitly. But as David asked: why on earth do you want to test that? cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding sets
Hi all, I'd like to find how many sets of 1s there are in the following example; x - rep(c(1,2,1,3,5), each=5) I know that there are two sets of 1s, visually. Any function in R that allows me to automate the process? Thanks. Muhammad __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding sets
Hi: Here's one approach: x - rep(c(1,2,1,3,5), each=5) rle(x) Run Length Encoding lengths: int [1:5] 5 5 5 5 5 values : num [1:5] 1 2 1 3 5 table(rle(x)$values) 1 2 3 5 2 1 1 1 unname(table(rle(x)$values))[1] [1] 2 HTH, Dennis On Fri, Jun 25, 2010 at 2:30 AM, Muhammad Rahiz muhammad.ra...@ouce.ox.ac.uk wrote: Hi all, I'd like to find how many sets of 1s there are in the following example; x - rep(c(1,2,1,3,5), each=5) I know that there are two sets of 1s, visually. Any function in R that allows me to automate the process? Thanks. Muhammad __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?
I've been looking around myself, but I couldn't find any. Maybe somebody will chime in to direct you to the correct places. I also checked the papers, and it seems not too hard to implement. If I find some time, I'll take a look at it next week. For the other two gentlemen, check: http://www.getahead.psu.edu/PDF/EuclideanDistanceMatrixAnalysis.pdf http://www.getahead.psu.edu/PDF/no.1.pdf Cheers Joris On Fri, Jun 25, 2010 at 8:30 AM, gokhanocakoglu ocako...@uludag.edu.tr wrote: In fact, Euclidean Distance Matrix Analysis (EDMA) of form is a coordinate free approach to the analysis of form using landmark data which was developed by Subhash Lele and Joan Richstmeier. They also developed a computer program (http://www.getahead.psu.edu/comment/edma.asp) that allow to perform several techniques including EDMA I-II but I wonder is there a package or code available in R to perform EDMA... thanks -- View this message in context: http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268018.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correctly plotting bar and scatter chart on 2-y axis plot with par(new=T)
On 06/25/2010 05:47 AM, dan.weavesham wrote: Hello, Thanks for the advice so far -- still struggling with it, I must admit. Here is some sample data, which I hope helps: # y axis #1 -- data for the bar chart -30353.382 -21693.519 -7049.923 -72968.722 -10267.584 -269432.795 -19847.670 -686283.171 -376231.754 -597800.080 -274637.587 -112663.167 -39550.445 -133916.431 # x axis -- in specific order, so cannot be tampered with !! ;-) 1 7 13 2 8 14 3 9 4 10 5 11 6 12 # y axis #2 -- scatter chart 50 25 5 25 5 100 5 100 100 75 75 50 50 50 Does this help explain what I'm looking to do? If not, is there a way I can get plot() to not change the order of the x axis data points -- so instead of plotting 1,2,3,n as per my original post, it plots 1,7,13,n? (I've tried coercing the data into character format with no luck) Hi Dan, Does this do what you want? # y axis #1 -- data for the bar chart y1-c(-30353.382,-21693.519,-7049.923,-72968.722,-10267.584,-269432.795, -19847.670,-686283.171,-376231.754,-597800.080,-274637.587,-112663.167, -39550.445,-133916.431) # x axis -- in specific order, so cannot be tampered with !! ;-) # these are really the x tick labels x-c(1,7,13,2,8,14,3,9,4,10,5,11,6,12) # y axis #2 -- scatter chart y2-c(50,25,5,25,5,100,5,100,100,75,75,50,50,50) library(plotrix) twoord.plot(1:14,y1,1:14,y2,type=c(bar,p),lcol=2,rcol=4, lylim=c(-7,7),rylim=c(-100,100),xtickpos=1:14,xticklab=x) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave: The opposite of tangle
stefan.d...@gmail.com wrote: Hi, I am using Sweave to write an article. If I want to convert the *.rnw to a *.tex file I have to run Sweave which might take a long time. Is there away to get a tex-file as result without (evaluating) the R-chunks, i.e. the opposite of tangle (that just gives R-chunk). Thanks, Stefan This is untested, but does Sweave(file.rnw, eval=FASLE) do what you want? -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.tho...@utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] installing multicore package
On 25.06.2010 06:39, suman dhara wrote: Sir, I want to apply mclapply() function for my analysis. So, I have to install multicore package. But I can not install the package. install.packages(multicore) It gives that package multicore is not available. Can you help me? If this is Windows (unstated) we cannot help, since multicore is not available for that platform. Uwe Ligges Regards, Suman Dhara [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave: The opposite of tangle
Kevin E. Thorpe wrote: stefan.d...@gmail.com wrote: Hi, I am using Sweave to write an article. If I want to convert the *.rnw to a *.tex file I have to run Sweave which might take a long time. Is there away to get a tex-file as result without (evaluating) the R-chunks, i.e. the opposite of tangle (that just gives R-chunk). Thanks, Stefan This is untested, but does Sweave(file.rnw, eval=FASLE) do what you want? That should be FALSE above. Don't post before coffee. h -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.tho...@utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused: Looping in dataframes
On Fri, Jun 25, 2010 at 1:54 PM, Paul Hiemstra p.hiems...@geo.uu.nl wrote: On 06/25/2010 10:02 AM, phani kishan wrote: Hey, I have a data frame x which consists of say 10 vectors. I essentially want to find out the best fit exponential smoothing for each of the vectors. The problem while I'm getting results when i say lapply(x,ets) I am getting an error when I say myprint function(x) { for(i in 1:length(x)) { ets(x[i],model=AZZ,opt.crit=c(amse)) Hi, Please provide a reproducible example, as stated in the posting guide. My guess is that replacing x[i] by x[[i]] would solve the problem. Double brackets return a vector in stead of a data.frame that has just column i. Hey Paul, As requested. My example data frame sdata: SKU1SKU2 SKU3 SKU4 1 583.8 574.6 1106.9 648.1 2 441.7 552.8 1021.3 353.6 3 454.2 555.7 998.3 306.4 4 569.7 507.6 811.1 360.7 5 512.3 620.0 1046.3 713.9 6 580.8 668.2 732.0 490.9 7 648.5 766.9 653.4 422.1 8 617.4 657.1 602.1 190.8 9 826.8 767.3 640.5 324.1 10 1163.0 657.6 429.6 181.1 11 643.5 788.9 569.1 331.9 12 846.9 568.6 425.1 224.6 13 580.7 582.9 434.2 226.9 now when I apply lapply(sdata,ets) I get a result as: $SKU1 ETS(A,N,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 0.3845 Initial states: l = 533.3698 sigma: 181.7615 AIC AICc BIC 172.6144 173.8144 173.7443 $SKU2 ETS(A,N,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 0.5026 Initial states: l = 567.821 sigma: 86.7074 AIC AICc BIC 153.3704 154.5704 154.5003 $SKU3 ETS(A,A,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 1e-04 beta = 1e-04 Initial states: l = 1189.2221 b = -64.3776 sigma: 85.4153 AIC AICc BIC 156.9800 161.9800 159.2398 $SKU4 ETS(A,A,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 1e-04 beta = 1e-04 Initial states: l = 566.9001 b = -27.8818 sigma: 127.2654 AIC AICc BIC 167.3475 172.3475 169.6073 Now when I run the same using: myfun-function(x) { for(i in 1:length(x)) { ets(x[i]) } } I got the error as mentioned before. Now on modifying it to myfun-function(x) { for(i in 1:length(x)) { return(ets(x[[i]]) } } I only got the output as ETS(A,N,N) Call: ets(y = x[[i]], model = AZZ, opt.crit = c(amse)) Smoothing parameters: alpha = 0.3983 Initial states: l = 516.188 sigma: 181.8688 AIC AICc BIC 172.6298 173.8298 173.7597 I think its considering whole dataframe as a series. As said my objective it to essentially come up with a best exponential model for each of the SKU's in the dataframe. However I want to be able to extract information like mse, mape etc later. So kindly suggest. Thanks in advance, Phani cheers, Paul } } The error message is that* Error in ets(x[i], model = AZZ, opt.crit = c(amse)) : y should be a univariate time series* Could someone please explain why this is happening? I also want to be able to extract data like coef's, errors (MAPE,MSE etc.) Thanks and regards, Phani -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 253 5773 http://intamap.geo.uu.nl/~paul http://intamap.geo.uu.nl/%7Epaul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 -- A. Phani Kishan 3rd Year B.Tech Dept. of Computer Science Engineering IIT MADRAS Ph: +919962363545 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
On Sun, Jun 20, 2010 at 2:31 PM, Muenchen, Robert A (Bob) muenc...@utk.edu wrote: come up with so far at http://r4stats.com/popularity . I'm sure people will have plenty of ideas on how to improve this, so please let me know what you think. This is not much of a metric, probably not even a ballpark, but I have a habit of measuring the popularity of a software by the number of unread messages in my mail account, sent to one of its main mailing lists. For example, I subscribed to Gentoo, Xfce and LyX MLs much earlier than to that of R, but R quickly and surpassed all in number of unread messages. At the moment I have the following: R ( 37k), LyX (10k), Debian (7k), Xfce (3k), Geany (.5k). I dare say that R might be more popular than Debian, but again, any such estimation seems farfetched. Regards Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] i want create script
I'd suggest having a look at the manuals on the [url=http://www.r-project.org][b]R[/b][/url]site, especially the Introduction to R and R Data Import/Export. Some helpful tutorials may be found at http://www.math.ilstu.edu/dhkim/Rstuff/Rtutor.html and http://www.sph.umich.edu/csg/abecasis/class/815.05.pdf --- On Fri, 6/25/10, vijaysheegi vijay.she...@gmail.com wrote: From: vijaysheegi vijay.she...@gmail.com Subject: [R] i want create script To: r-help@r-project.org Received: Friday, June 25, 2010, 2:26 AM Hi R community, I want to create a script which will take the .csv table as input and do some prediction and output should be returned to some file.Inputs is exel sheet containing some tables of data.out should be table of predicted data.Will some one help me in this regards... Thanks in advance. I am using Windows R.Please advise proccedure to create Rscript. Regards - Vijay Research student Bangalore India -- View this message in context: http://r.789695.n4.nabble.com/i-want-create-script-tp2268011p2268011.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ask a question about list in R
On Jun 25, 2010, at 1:00 AM, song song wrote: my list al is as below: al=list(c(2,3),5,7) al [[1]] [1] 2 3 [[2]] [1] 5 [[3]] [1] 7 and I check the second component, its element is 5, then I remove this, now my al is: al[[2]][al[[2]]!=5]-al[[2]] al [[1]] [1] 2 3 [[2]] numeric(0) [[3]] [1] 7 The Question is, how I can get the new list without the second component, that is : alwanted [[1]] [1] 2 3 [[2]] [1] 7 Another way: al=list(c(2,3),5,7) al[-2] [[1]] [1] 2 3 [[2]] [1] 7 alwanted - al[-2] Negative indexing with the [ operator. -- David. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Handouts / Reports or just simply printing text to PDF?
Check out the brew package, by Jeff Horner. Ralf B wrote: I assume R won't easily generate nice reports (unless one starts using Sweave and LaTeX) but perhaps somebody here knows a package that can create report like output for special cases? How can I simply plot output into PDF? Perhaps you know a package I should check out? What do you guys do to create handouts (before actually publishing)? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew S. Shotwell Graduate Student Division of Biostatistics and Epidemiology Medical University of South Carolina http://biostatmatt.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding sets
On Jun 25, 2010, at 5:43 AM, Dennis Murphy wrote: Hi: Here's one approach: x - rep(c(1,2,1,3,5), each=5) rle(x) Run Length Encoding lengths: int [1:5] 5 5 5 5 5 values : num [1:5] 1 2 1 3 5 table(rle(x)$values) 1 2 3 5 2 1 1 1 unname(table(rle(x)$values))[1] [1] 2 This method does not require visual inspection of the intermediate result: sum(rle(x)$values==1) [1] 2 -- David. HTH, Dennis On Fri, Jun 25, 2010 at 2:30 AM, Muhammad Rahiz muhammad.ra...@ouce.ox.ac.uk wrote: Hi all, I'd like to find how many sets of 1s there are in the following example; x - rep(c(1,2,1,3,5), each=5) I know that there are two sets of 1s, visually. Any function in R that allows me to automate the process? Thanks. Muhammad __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimizing given two vectors of data
Optim uses vectors of _parameters_, not of data. You add a (likelihood) function, give initial values of the parameters, and get the optimized parameters back. See ?optim and the examples therein. It contains an example for optimization using multiple data columns. Cheers Joris On Fri, Jun 25, 2010 at 8:12 AM, confusedSoul ruchir_402...@infosys.com wrote: I am trying to estimate an Arrhenius-exponential model in R. I have one vector of data containing failure times, and another containing corresponding temperatures. I am trying to optimize a maximum likelihood function given BOTH these vectors. However, the optim command takes only one such vector parameter. How can I pass both vectors into the function? -- View this message in context: http://r.789695.n4.nabble.com/Optimizing-given-two-vectors-of-data-tp2268002p2268002.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?
thanks for your interests Joris Gokhan OCAKOGLU Uludag University Faculty of Medicine Department of Biostatistics http://www20.uludag.edu.tr/~biostat/ocakoglui.htm -- View this message in context: http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268257.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Hello, I'm new in using the R, but from what I read is an excellent tool. Would you like if I could help, I am trying create an array from reading a text file. The idea is to read the file, and transform the data in binary format, for example. The calves of this file format. A,B,C,D,G A,C,E,O F,G Put this away a b c d e f g o 1 1 1 1 1 0 0 1 0 2 1 0 1 0 1 0 0 1 3 0 0 0 0 0 1 0 0 and display in monitor. Thanks for the help Portugalmail - O email preferido dos portugueses! http://www.portugalmail.pt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused: Looping in dataframes
On Jun 25, 2010, at 7:09 AM, phani kishan wrote: On Fri, Jun 25, 2010 at 1:54 PM, Paul Hiemstra p.hiems...@geo.uu.nl wrote: On 06/25/2010 10:02 AM, phani kishan wrote: Hey, I have a data frame x which consists of say 10 vectors. I essentially want to find out the best fit exponential smoothing for each of the vectors. The problem while I'm getting results when i say lapply(x,ets) I am getting an error when I say myprint function(x) { for(i in 1:length(x)) { ets(x[i],model=AZZ,opt.crit=c(amse)) Hi, Please provide a reproducible example, as stated in the posting guide. My guess is that replacing x[i] by x[[i]] would solve the problem. Double brackets return a vector in stead of a data.frame that has just column i. Hey Paul, As requested. My example data frame sdata: SKU1SKU2 SKU3 SKU4 1 583.8 574.6 1106.9 648.1 2 441.7 552.8 1021.3 353.6 3 454.2 555.7 998.3 306.4 4 569.7 507.6 811.1 360.7 5 512.3 620.0 1046.3 713.9 6 580.8 668.2 732.0 490.9 7 648.5 766.9 653.4 422.1 8 617.4 657.1 602.1 190.8 9 826.8 767.3 640.5 324.1 10 1163.0 657.6 429.6 181.1 11 643.5 788.9 569.1 331.9 12 846.9 568.6 425.1 224.6 13 580.7 582.9 434.2 226.9 now when I apply lapply(sdata,ets) I get a result as: $SKU1 ETS(A,N,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 0.3845 Initial states: l = 533.3698 sigma: 181.7615 AIC AICc BIC 172.6144 173.8144 173.7443 $SKU2 ETS(A,N,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 0.5026 Initial states: l = 567.821 sigma: 86.7074 AIC AICc BIC 153.3704 154.5704 154.5003 $SKU3 ETS(A,A,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 1e-04 beta = 1e-04 Initial states: l = 1189.2221 b = -64.3776 sigma: 85.4153 AIC AICc BIC 156.9800 161.9800 159.2398 $SKU4 ETS(A,A,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 1e-04 beta = 1e-04 Initial states: l = 566.9001 b = -27.8818 sigma: 127.2654 AIC AICc BIC 167.3475 172.3475 169.6073 Now when I run the same using: myfun-function(x) { for(i in 1:length(x)) { ets(x[i]) } } I got the error as mentioned before. Now on modifying it to myfun-function(x) { for(i in 1:length(x)) { return(ets(x[[i]]) } } I only got the output as ETS(A,N,N) Call: ets(y = x[[i]], model = AZZ, opt.crit = c(amse)) Smoothing parameters: alpha = 0.3983 Initial states: l = 516.188 sigma: 181.8688 AIC AICc BIC 172.6298 173.8298 173.7597 I think its considering whole dataframe as a series. Doubtful. It is quietly calculating all of the requested models but you did not do anything with them inside the loop (which is a function). You could have assigned them to something permanent or printed them (or both): ets_x - list() for(i in 1:length(x)) { print(ets(x[[i]]); ets_x - c(ets_x, ets(x[[i]]) } } ets_x As said my objective it to essentially come up with a best exponential model for each of the SKU's in the dataframe. However I want to be able to extract information like mse, mape etc later. So kindly suggest. Thanks in advance, Phani __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused: Looping in dataframes
Hey, I only got the output once cuz I was returning from the function at the end of one loop. I set that right and I have printed the values. function being used by me now is: function(x) { for(i in 1:length(x)) { print(names(x[i])) print(myets(x[[i]])) } } where myets is my customized exponential smoothing model. However the problem is that if I run my myets function individually on each of the SKU's I get values of MAPE, MSE etc. However by running the above loop I dont get the values. How do I store the values for me to look at them later? There are minor changes (not significant) in the values of parameters from applying the above function as opposed to lapply. Why could it be so?? Phani -- A. Phani Kishan 3rd Year B.Tech Dept. of Computer Science Engineering IIT MADRAS Ph: +919962363545 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assigning variable value as name to cbind column
Try this also: cbind(dataTest, `colnames-`(cbind(new.data[1:nrow(dataTest)]), name)) On Fri, Jun 25, 2010 at 2:47 AM, Ralf B ralf.bie...@gmail.com wrote: Hi all, I have this (non-working) script: dataTest - data.frame(col1=c(1,2,3)) new.data - c(1,2) name - test n.row - dim(dataTest)[1] length(new.data) - n.row names(new.data) - name cbind(dataTest, name=new.data) print(dataTest) and would like to bind the new column 'new.data' to 'dataTest' by using the value of the variable 'name' as the column name. The end result should look like this: col1 test 1 1 1 2 2 2 3 3 NA The best I got was that 'name' became the column name but never the actual value of 'name'. How can i do that? (This is actually a function that runs many time -- this means a manual workaround is not feasible). Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
-Original Message- From: Liviu Andronic [mailto:landronim...@gmail.com] Sent: Friday, June 25, 2010 7:15 AM To: Muenchen, Robert A (Bob) Cc: r-help@r-project.org Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... On Sun, Jun 20, 2010 at 2:31 PM, Muenchen, Robert A (Bob) muenc...@utk.edu wrote: come up with so far at http://r4stats.com/popularity . I'm sure people will have plenty of ideas on how to improve this, so please let me know what you think. This is not much of a metric, probably not even a ballpark, but I have a habit of measuring the popularity of a software by the number of unread messages in my mail account, sent to one of its main mailing lists. For example, I subscribed to Gentoo, Xfce and LyX MLs much earlier than to that of R, but R quickly and surpassed all in number of unread messages. At the moment I have the following: R ( 37k), LyX (10k), Debian (7k), Xfce (3k), Geany (.5k). I dare say that R might be more popular than Debian, but again, any such estimation seems farfetched. Regards Liviu Hi Liviu, E-mail was the thing that got me back to this paper. I had been working on variations of measures for several years was frustrated mostly by how many problems I ran into regarding search logic (SAS stands for about 15 scientific topics and of course R is far worse). I have all my listserv email routed to a set of folders which I always empty at the same time. I noticed that recently R-Help had really taken off and that Statalist had surpassed SAS-L. So I got the latest monthly data from the listservs and switched the program from doing yearly counts to means of the monthly figures so I could add 2010 to it. Figure 1 at http://r4stats.com/popularity is indeed the number of emails send by each of the listservs. All these measures have their own limitations, but I find that graph the most interesting since it includes the trends across time. Cheers, Bob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] i want create script
Please read the posting guide : http://www.R-project.org/posting-guide.html Your question is very vague. One could assume you're completely new to R and want the commands to read a csv file (see ?read.csv), and to write away a table (eg ?write.table to write your predicted data in a text format). My guess is you want to run this script in the shell without having to open R, similar to a perl scipt. For this, take a look at: http://cran.r-project.org/doc/manuals/R-intro.html#Scripting-with-R http://projects.uabgrid.uab.edu/r-group/wiki/CommandLineProcessing Cheers Joris On Fri, Jun 25, 2010 at 8:26 AM, vijaysheegi vijay.she...@gmail.com wrote: Hi R community, I want to create a script which will take the .csv table as input and do some prediction and output should be returned to some file.Inputs is exel sheet containing some tables of data.out should be table of predicted data.Will some one help me in this regards... Thanks in advance. I am using Windows R.Please advise proccedure to create Rscript. Regards - Vijay Research student Bangalore India -- View this message in context: http://r.789695.n4.nabble.com/i-want-create-script-tp2268011p2268011.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create group markers in original data frame ie. countinued... ? to calculate sth for groups defined between points in one variable (string), /separating/ spliting variable into groups by i.e.
Dear useRs, at the beginning, Joris Meys, thank you for explaining how to obtain calculation result possible for groups between string marks in one variable in data frame, like in this example below (between START and STOP), wchich I would like to complete at the end by asking about... how is possible to mark each observations presented in oryginal data set # so firstly, below # START...working example of solution proposed by: Joris Meys [jorism...@gmail.com] # Same trick : c0-rbind( 1, 2 , 3, 4, 5, 6, 7, 8, 9,10,11, 12,13,14,15,16,17 ) c0 c1-rbind(10, 20 ,30,40, 50,10,60,20,30,40,50, 30,10, 0,NA,20,10.3444) c1 c2-rbind(NA,A,NA,NA,B,NA,NA,NA,NA,NA,NA,C,NA,NA,NA,NA,D) c2 pos - which(!is.na(C.df$c2)) idx - sapply(2:length(pos),function(i) pos[i-1]:(pos[i]-1)) names(idx) - sapply(2:length(pos), function(i) paste(C.df$c2[pos[i-1]],-,C.df$c2[pos[i]])) out - lapply(idx,function(i) summary(C.df[i,1:2])) out #STOP ... below from: Sent: Thu 2010-06-24 18:02: Joris Meys [jorism...@gmail.com] #Thank you, it is done and works very well # - - - - - - - -- - - - - - -- - - # Now, I try to finish my question to add gruping sybol to the whole set, making # each observation marked by the name of the interval in which that observation is placed. # to tell the observator, that this observation is between ...A and B, to enable sorting, to eneable simple acess using match in_sub_starting_from-rbind(NA,A,A,A,B,B,B,B,B,B,B,C,C,C,C,C,C) in_sub_finished_by -rbind(NA,B,B,B,C,C,C,C,C,C,C,D,D,D,D,D,D) in_sub_limited_by-rbind(NA,A-B,A-B,A-B,B-C,B-C,B-C,B-C,B-C,B-C,B-C,C-D,C-D,C-D,C-D,C-D,C-D) C.df-data.frame(c0,c1,c2,in_sub_starting_from,in_sub_finished_by,in_sub_limited_by) C.df # # Therefore my one more question: How is possible to create these vectors automaticly, having C.df$c2 (and of course having also: C.df$c0,C.df$c1), : C.df$in_sub_starting_from C.df$in_sub_finished_by C.df$in_sub_limited_by #to tell the observator, that this observation is between ...A and B, to enable sorting, to eneable simple acess using match #for example, to make possible this access to data: #to to take the 7'th observation from any row of data frame, C.df$c0[7] C.df$c1[c0==7] #and could #find in this same row in_sub_starting_from that observation is preceded by ... C.df$in_sub_starting_from[c0==7] #find in this same row in_sub_finished_by that observation is before ... C.df$in_sub_finished_by[c0==7] #find in this same row in_sub_finished_by that this observation is between ... C.df$in_sub_limited_by[c0==7] # ? #Thanks for advices, and maybe and this answer, #looking impatiently for time with possible access to internet... # Sincerely, Kaluza and the beginnig of this story; -Original Message- From: Eugeniusz Kaluza Sent: Thu 2010-06-24 17:12 To: r-help@r-project.org Subject: PD: [R] ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2 Dear useRs, Thanks for advice from Joris Meys, Now will try to think how to make it working for less specyfic case, to make the problem more general. Then the result should be displayed for every group between non empty string in c2 i.e. not only result for: #mean: c1 c3c4 c5 20 Start1 Stop1 Start1-Stop1 25.48585 Start2 Stop2 Start2-Stop2 but also for every one group created by space between two closest strings in c2, that contains only seriess of Na, NA, NA, separated from time to time by one string i.e.: #mean: c1 c3c4 c5 20 Start1 Stop1 Start1-Stop1 .. Stop1 Start2 Stop1-Start2 25.48585 Start2 Stop2 Start2-Stop2 i.e. to rewrite this maybe for another simpler version of command but also for every one group created by space between two closest strings in c2, that contains only seriess of Na, NA, NA, separated from time to time by one string A, NA, NA, NA, NA, B, NA, NA, NA, C, NA,NA,NA,NA,D, NA,NA i.e.: #mean: c1 c3c4 c5 20 A B A-B .. B C B-C 25.48585 C D C-D ... Looking for more general method (function), grouping between these letters in c2, I will now try to study solution proposed by Joris Meys Thanks for immediate aswer Kaluza -Wiadomosc oryginalna- Od: Joris Meys [mailto:jorism...@gmail.com] Wyslano: Cz 2010-06-24 15:14 Do: Eugeniusz Kaluza DW: r-help@r-project.org Temat: Re: [R] ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable
Re: [R] Optimizing given two vectors of data (confusedSoul)
I am trying to estimate an Arrhenius-exponential model in R. I have one vector of data containing failure times, and another containing corresponding temperatures. I am trying to optimize a maximum likelihood function given BOTH these vectors. However, the optim command takes only one such vector parameter. How can I pass both vectors into the function? You need to combine your vectors params-c(vecone, vectwo) Inside your objective function, you will need to split them out again. However, I have some suspicions that you are referring to the DATA for the function rather than the parameters that are being optimized. The data goes into the '...' arguments to optim and other optimization tools. JN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
The central limit theorem doesn't help. It just addresses type I error, not power. Frank On 06/25/2010 04:29 AM, Joris Meys wrote: As a remark on your histogram : use less breaks! This histogram tells you nothing. An interesting function is ?density , eg : x-rnorm(250) hist(x,freq=F) lines(density(x),col=red) See also this ppt, a very nice and short introduction to graphics in R : http://csg.sph.umich.edu/docs/R/graphics-1.pdf 2010/6/25 Atte Tenkanenatte...@utu.fi: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 You should think about the central limit theorem. Actually, you can just use a t-test to compare means, as with those sample sizes the mean is almost certainly normally distributed. i would like to test, whether the mean of the sample differ significantly from the population mean. According to probability theory, this will be in 5% of the cases if you repeat your sampling infinitly. But as David asked: why on earth do you want to test that? cheers Joris -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
Maybe something like: y - readLines(foo) z - strsplit(y, ,) cols - sort(unique(unlist(z))) # Assuming this is what you want for column names m - matrix(0, nrow=length(z), ncol=length(cols), dimnames=list(as.character(1:length(z)), cols)) for (i in 1:length(z)) { m[i, z[[i]]] - 1 } print(m) # A B C D E F G O # 1 1 1 1 1 0 0 1 0 # 2 1 0 1 0 1 0 0 1 # 3 0 0 0 0 0 1 1 0 Hope this helps you a little. Allan On 25/06/10 13:00, ricardo.sousa2...@portugalmail.pt wrote: Hello, I'm new in using the R, but from what I read is an excellent tool. Would you like if I could help, I am trying create an array from reading a text file. The idea is to read the file, and transform the data in binary format, for example. The calves of this file format. A,B,C,D,G A,C,E,O F,G Put this away a b c d e f g o 1 1 1 1 1 0 0 1 0 2 1 0 1 0 1 0 0 1 3 0 0 0 0 0 1 0 0 and display in monitor. Thanks for the help __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gsub with regular expression
If I have a text with 7 words per line and I would like to put first and second word joined in a vector and the rest of words one per column in a matrix how can I do it? First 2 lines of my text file: 2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido 2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca Results: Vector: 2008/12/31 12:23:31 2010/02/01 02:35:31 Matrix numero 343.233.233 Rodeo Vaca Ruido palabra 111.111.222 abejorro Rodeo Vaca Thks, Sebastian. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub with regular expression
On Fri, Jun 25, 2010 at 10:48 AM, Sebastian Kruk residuo.so...@gmail.com wrote: If I have a text with 7 words per line and I would like to put first and second word joined in a vector and the rest of words one per column in a matrix how can I do it? First 2 lines of my text file: 2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido 2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca Results: Vector: 2008/12/31 12:23:31 2010/02/01 02:35:31 Matrix numero 343.233.233 Rodeo Vaca Ruido palabra 111.111.222 abejorro Rodeo Vaca Here are two solutions. Both solutions are three statements long (read in the data, display the vector, display the matrix). Replace textConnection(text) with myfile.dat, say, in each. 1. Here is a sub solution: L - readLines(textConnection(Lines)) sub((\\S+ \\S+) .*, \\1, L) sub(\\S+ \\S+ , , L) 2. Here is a solution using zoo: Lines - 2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido 2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca library(zoo) z - read.zoo(textConnection(Lines), index = 1:2, FUN = function(x) paste(x[,1], x[,2])) time(z) # the vector coredata(z) # the matrix Another possibility would be to convert to chron or POSIXct at the same time as reading it in: # chron library(chron) z - read.zoo(textConnection(Lines), index = 1:2, FUN = function(x) as.chron(paste(x[,1], x[,2]), format = %Y/%m/%d %H:%M:%S)) # POSIXct z - read.zoo(textConnection(Lines), index = 1:2, FUN = function(x) as.POSIXct(paste(x[,1], x[,2]), format = %Y/%m/%d %H:%M:%S)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub with regular expression
help(strsplit) is your friend, for example: t - c(2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido, 2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca) m - do.call(rbind, strsplit(t, [[:space:]]+)) # Matrix of all the data v - paste(m[, 1], m[, 2]) # The vector m - m[,-c(1,2)] # The matrix Hope this helps a little. Allan On 25/06/10 15:48, Sebastian Kruk wrote: If I have a text with 7 words per line and I would like to put first and second word joined in a vector and the rest of words one per column in a matrix how can I do it? First 2 lines of my text file: 2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido 2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca Results: Vector: 2008/12/31 12:23:31 2010/02/01 02:35:31 Matrix numero 343.233.233 Rodeo Vaca Ruido palabra 111.111.222 abejorro Rodeo Vaca Thks, Sebastian. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fortune?
On average, any data manipulation that can be described in a sentence or two of English can be programmed in one line in R. If you find yourself writing a long 'for' loop to do something that sounds simple, take a step back and research if an existing combination of functions can easily handle your request. -- Erik Iverson I nominate this for a Fortune. (email thread in which it appeared below) -- Bert Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Erik Iverson Sent: Thursday, June 24, 2010 4:14 PM To: john polo Cc: r-help@r-project.org Subject: Re: [R] write a loop for tallies On average, any data manipulation that can be described in a sentence or two of English can be programmed in one line in R. If you find yourself writing a long 'for' loop to do something that sounds simple, take a step back and research if an existing combination of functions can easily handle your request. john polo wrote: Dear R users, I have a list of numbers such as n [1] 3000 4000 5000 3000 5000 6000 4000 5000 7000 5000 6000 7000 and i'd like to set up a loop that will keep track of the number of occurences of each of the values that occur in the list, e.g. 3000: 2 4000: 2 5000: 4 I came up with the following: a- for (i in 1:length(n)) { r-0 s-0 t-0 u-0 v-0 ifelse(n[i] == 3000, r - r+1, ifelse(n[i] == 4000, s - r+1, ifelse(n[i] == 5000, t - r+1, ifelse(n[i] == 6000, u - r+1, ifelse(n[i] == 7000, v - r+1, NA) r-sum(r) s-sum(s) t-sum(t) u-sum(u) v-sum(v) cat(r = , r, \n) cat(s = , s, \n) cat(t = , t, \n) cat(u = , u, \n) cat(v = , v, \n) } However, this is the output: r = 1 s = 0 t = 0 u = 0 v = 0 r = 0 s = 1 t = 0 u = 0 v = 0 r = 0 s = 0 t = 1 u = 0 v = 0 r = 1 s = 0 t = 0 u = 0 v = 0 r = 0 s = 0 t = 1 u = 0 v = 0 r = 0 s = 0 t = 0 u = 1 v = 0 r = 0 s = 1 t = 0 u = 0 v = 0 r = 0 s = 0 t = 1 u = 0 v = 0 r = 0 s = 0 t = 0 u = 0 v = 1 r = 0 s = 0 t = 1 u = 0 v = 0 r = 0 s = 0 t = 0 u = 1 v = 0 r = 0 s = 0 t = 0 u = 0 v = 1 How should i write this loop, please? I've tried variations with if instead of ifelse and receive errors about unexpected { or unexpected ). regards, john __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
2010/6/25 Frank E Harrell Jr f.harr...@vanderbilt.edu: The central limit theorem doesn't help. It just addresses type I error, not power. Frank I don't think I stated otherwise. I am aware of the fact that the wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared to the t-test in case of skewed distributions. Apologies if I caused more confusion. The problem with the wilcoxon is twofold as far as I understood this data correctly : - there are quite some ties - the wilcoxon assumes under the null that the distributions are the same, not only the location. The influence of unequal variances and/or shapes of the distribution is enhanced in the case of unequal sample sizes. The central limit theory makes that : - the t-test will do correct inference in the presence of ties - unequal variances can be taken into account using the modified t-test, both in the case of equal and unequal sample sizes For these reasons, I would personally use the t-test for comparing two samples from the described population. Your mileage may vary. Cheers Joris On 06/25/2010 04:29 AM, Joris Meys wrote: As a remark on your histogram : use less breaks! This histogram tells you nothing. An interesting function is ?density , eg : x-rnorm(250) hist(x,freq=F) lines(density(x),col=red) See also this ppt, a very nice and short introduction to graphics in R : http://csg.sph.umich.edu/docs/R/graphics-1.pdf 2010/6/25 Atte Tenkanenatte...@utu.fi: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 You should think about the central limit theorem. Actually, you can just use a t-test to compare means, as with those sample sizes the mean is almost certainly normally distributed. i would like to test, whether the mean of the sample differ significantly from the population mean. According to probability theory, this will be in 5% of the cases if you repeat your sampling infinitly. But as David asked: why on earth do you want to test that? cheers Joris -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fast and simple tool for re-sampling of asynchronous time series ?
Hi all, I'm looking for a function which could do some fast and simple re-sampling of asynchronous time series. Below is a MCE of the kind of algorithm I need. As you can see, it's quite crude, but it's enough for my current needs. The only problem is that it is quite slow on real use case. I've got a C version which is much faster, but I'd like to have a pure-R program. Any pointer to the relevant part of the doc one one of the time-series packages ? Any suggestion or advice ? Thanks in advance, B. Piguet. Here is the exemple : Tx - seq(1, 50, 0.5) Tx - Tx + rnorm(length(Tx), 0, 0.1) X - sin(Tx/10.0) + sin(Tx/5.0) + rnorm(length(Tx), 0, 0.1) Ty - seq(1, 50, 0.) Ty - Ty + rnorm(length(Ty), 0, 0.02) Y - sin(Ty/10.0) + sin(Ty/5.0) + rnorm(length(Ty), 0, 0.1) w - 0.25 Y_sync - rep(NA, length(Tx)) for (i in 1:length(Tx)) { T_min - Tx[i] - w T_max - Tx[i] + w Y_sync[i] - mean(Y[Ty = T_min Ty = T_max ]) } diff = X - Y_sync print(summary(diff)) print(summary(lm(Y_sync~X))) plot (diff~Tx, type=l) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fortune?
On Fri, Jun 25, 2010 at 4:17 PM, Bert Gunter gunter.ber...@gene.com wrote: On average, any data manipulation that can be described in a sentence or two of English can be programmed in one line in R. If you find yourself writing a long 'for' loop to do something that sounds simple, take a step back and research if an existing combination of functions can easily handle your request. I've already fallen in this trap. A couple hours of reading Rnews on the apply() family would have saved me a month or so of for() programming. Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub with regular expression
On Fri, Jun 25, 2010 at 11:11 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Fri, Jun 25, 2010 at 10:48 AM, Sebastian Kruk residuo.so...@gmail.com wrote: If I have a text with 7 words per line and I would like to put first and second word joined in a vector and the rest of words one per column in a matrix how can I do it? First 2 lines of my text file: 2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido 2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca Results: Vector: 2008/12/31 12:23:31 2010/02/01 02:35:31 Matrix numero 343.233.233 Rodeo Vaca Ruido palabra 111.111.222 abejorro Rodeo Vaca Here are two solutions. Both solutions are three statements long (read in the data, display the vector, display the matrix). Replace textConnection(text) with myfile.dat, say, in each. 1. Here is a sub solution: L - readLines(textConnection(Lines)) sub((\\S+ \\S+) .*, \\1, L) sub(\\S+ \\S+ , , L) The last line should be: as.matrix(read.table(textConnection(sub(\\S+ \\S+ , , L)), as.is = TRUE)) 3. And a third solution which perhaps is the most obvious: DF - read.table(textConnection(Lines), as.is = TRUE) paste(DF[, 1], DF[, 2]) # vector as.matrix(DF[-(1:2)]) # matrix __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple qqplot question
To add to/modify what Joris (and I) previously said: 1. qqplots are not cumulative distribution plots. Hence, as Joris said, the S-shape indicates short tails/bimodality compared to the normal. Why you continue to insist on carrying out normality tests that with so many points obviously will reject is beyond me! The bimodality is what's important. Why is it there? What is it telling you about your data (perhaps some sort of measurement shift...)? 2. My prior suggestion for plotting a reference line -- and Joris's confidence interval recommendations -- are in some sense wrong. The reason is that they give the conditional expectation and confidence intervals thereof of the quantiles of the y distribution conditioned on those of the x . What you probably want is the correlation line. One simple robust estimate of this -- and quick to calculate -- is just to mimic qqline() and calculate the 1st and 3rd quartiles of both distributions and use the line joining the corresponding quartile pairs ((1st,1st) and (3rd,3rd)) . I leave the trivial algebra to you -- quantile() gets the quartiles. Of course, there's a literature on this if you want to do something authoritative -- and perhaps R functions somewhere based on it. Perhaps some kind (and wiser than I) soul will provide references. (However, I doubt that the line so obtained will differ appreciably from my earlier incorrect recommendation, which was probably good enough for eyeballing in most cases.) Finally, risking hubris again, I would suggest that if the two distributions with so many points really are essentially identical, then this is scientifically uninteresting -- that is, the identity is a logical (and trivial) consequence of the systematic way in which the data were obtained, some sort of software (data collection?) issue, or the like -- i.e. not indicative of a scientifically interesting phenomenon. It might even indicate a problem with the data/measurements. My reasoning: real variability prohibits such identity. The identical bimodality may be a clue here. Again, note that I know nothing about what you are doing, and you are therefore justified in publicly chastising me for such ignorant speculation if I am wrong. I would welcome comments and criticisms from others on such speculation also. HTH, -- Bert Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Joris Meys Sent: Friday, June 25, 2010 2:15 AM To: Ralf B Cc: R mailing list Subject: Re: [R] Simple qqplot question Sorry, missed the two variable thing. Go with the lm solution then, and you can tweak the plot yourself (the confidence intervals are easily obtained via predict(lm.object, interval=prediction) ). The function qq.plot uses robust regression, but in your case normal regression will do. Regarding the shapes : this just indicates both tails are shorter than expected, so you have a kurtosis greater than 3 (or positive, depending whether you do the correction or not) Cheers Joris On Fri, Jun 25, 2010 at 4:10 AM, Ralf B ralf.bie...@gmail.com wrote: Short rep: I have two distributions, data and data2; each build from about 3 million data points; they appear similar when looking at densities and histograms. I plotted qqplots for further eye-balling: qqplot(data, data2, xlab = 1, ylab = 2) and get an almost perfect diagonal line which means they are in fact very alike. Now I tried to check normality using qqnorm -- and I think I am doing something wrong here: qqnorm(data, main = Q-Q normality plot for 1) qqnorm(data2, main = Q-Q normality plot for 2) I am getting perfect S-shaped curves (??) for both distributions. Am I something missing here? | | * * * * | * | * | * | * | * | * | * * * |- Thanks, Ralf -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fortune?
Bert, thanks for the pointer, added to the devel version of fortunes on R-Forge. thx, Z On Fri, 25 Jun 2010, Bert Gunter wrote: On average, any data manipulation that can be described in a sentence or two of English can be programmed in one line in R. If you find yourself writing a long 'for' loop to do something that sounds simple, take a step back and research if an existing combination of functions can easily handle your request. -- Erik Iverson I nominate this for a Fortune. (email thread in which it appeared below) -- Bert Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Erik Iverson Sent: Thursday, June 24, 2010 4:14 PM To: john polo Cc: r-help@r-project.org Subject: Re: [R] write a loop for tallies On average, any data manipulation that can be described in a sentence or two of English can be programmed in one line in R. If you find yourself writing a long 'for' loop to do something that sounds simple, take a step back and research if an existing combination of functions can easily handle your request. john polo wrote: Dear R users, I have a list of numbers such as n [1] 3000 4000 5000 3000 5000 6000 4000 5000 7000 5000 6000 7000 and i'd like to set up a loop that will keep track of the number of occurences of each of the values that occur in the list, e.g. 3000: 2 4000: 2 5000: 4 I came up with the following: a- for (i in 1:length(n)) { r-0 s-0 t-0 u-0 v-0 ifelse(n[i] == 3000, r - r+1, ifelse(n[i] == 4000, s - r+1, ifelse(n[i] == 5000, t - r+1, ifelse(n[i] == 6000, u - r+1, ifelse(n[i] == 7000, v - r+1, NA) r-sum(r) s-sum(s) t-sum(t) u-sum(u) v-sum(v) cat(r = , r, \n) cat(s = , s, \n) cat(t = , t, \n) cat(u = , u, \n) cat(v = , v, \n) } However, this is the output: r = 1 s = 0 t = 0 u = 0 v = 0 r = 0 s = 1 t = 0 u = 0 v = 0 r = 0 s = 0 t = 1 u = 0 v = 0 r = 1 s = 0 t = 0 u = 0 v = 0 r = 0 s = 0 t = 1 u = 0 v = 0 r = 0 s = 0 t = 0 u = 1 v = 0 r = 0 s = 1 t = 0 u = 0 v = 0 r = 0 s = 0 t = 1 u = 0 v = 0 r = 0 s = 0 t = 0 u = 0 v = 1 r = 0 s = 0 t = 1 u = 0 v = 0 r = 0 s = 0 t = 0 u = 1 v = 0 r = 0 s = 0 t = 0 u = 0 v = 1 How should i write this loop, please? I've tried variations with if instead of ifelse and receive errors about unexpected { or unexpected ). regards, john __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast and simple tool for re-sampling of asynchronous time series ?
On Fri, 25 Jun 2010, bruno Piguet wrote: Hi all, I'm looking for a function which could do some fast and simple re-sampling of asynchronous time series. Below is a MCE of the kind of algorithm I need. As you can see, it's quite crude, but it's enough for my current needs. The only problem is that it is quite slow on real use case. I've got a C version which is much faster, but I'd like to have a pure-R program. Any pointer to the relevant part of the doc one one of the time-series packages ? Any suggestion or advice ? Thanks in advance, B. Piguet. Here is the exemple : Tx - seq(1, 50, 0.5) Tx - Tx + rnorm(length(Tx), 0, 0.1) X - sin(Tx/10.0) + sin(Tx/5.0) + rnorm(length(Tx), 0, 0.1) Ty - seq(1, 50, 0.) Ty - Ty + rnorm(length(Ty), 0, 0.02) Y - sin(Ty/10.0) + sin(Ty/5.0) + rnorm(length(Ty), 0, 0.1) w - 0.25 Personally, I'd incline towards leaving the next lines to C, perhaps using the inline package. But if you want a purely R solution, the bioConductor IRanges package should help. I think the viewMeans() function will handle this loop. See http://comments.gmane.org/gmane.comp.lang.r.sequencing/1296 for some discussion. HTH, Chuck Y_sync - rep(NA, length(Tx)) for (i in 1:length(Tx)) { T_min - Tx[i] - w T_max - Tx[i] + w Y_sync[i] - mean(Y[Ty = T_min Ty = T_max ]) } diff = X - Y_sync print(summary(diff)) print(summary(lm(Y_sync~X))) plot (diff~Tx, type=l) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] best way to plot a evolution in time
Hi Nana, The question is not fully clear to me. Are you looking to plot the (let's call it) family tree of the genes ? (if so, then using plot(hclust(gene.dist)) Might be a direction for you) Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Fri, Jun 25, 2010 at 8:56 AM, nana adriana_f...@yahoo.co.uk wrote: a - c( 2 , 5, 15, 16) b - c(1 ,1, 8 , 8) c - c (10, 10 11 ,11) m-matrix(c(a,b,c),byrow=T,nrow=3) rownames(m)-c(gene a, 'gene b','gene c') m gene.dist-dist(m,method='euclidian') gene.dist [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 3D generalization of correlogram?
We have 3-Dimensional MRI density recordings of tumor tissue and would like to have a measure of patchiness, reflecting cluster size in the tissue. For 2-D slices, correlogram from MASS works well. Does someone know of a packages that provides a 3-D generalization of this measure? Or any alternatives? I assume this is quite a common question in climate research. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast and simple tool for re-sampling of asynchronous time series ?
On Fri, Jun 25, 2010 at 11:34 AM, bruno Piguet bruno.pig...@gmail.com wrote: Hi all, I'm looking for a function which could do some fast and simple re-sampling of asynchronous time series. Below is a MCE of the kind of algorithm I need. As you can see, it's quite crude, but it's enough for my current needs. The only problem is that it is quite slow on real use case. I've got a C version which is much faster, but I'd like to have a pure-R program. Any pointer to the relevant part of the doc one one of the time-series packages ? Any suggestion or advice ? Thanks in advance, B. Piguet. Here is the exemple : Tx - seq(1, 50, 0.5) Tx - Tx + rnorm(length(Tx), 0, 0.1) X - sin(Tx/10.0) + sin(Tx/5.0) + rnorm(length(Tx), 0, 0.1) Ty - seq(1, 50, 0.) Ty - Ty + rnorm(length(Ty), 0, 0.02) Y - sin(Ty/10.0) + sin(Ty/5.0) + rnorm(length(Ty), 0, 0.1) w - 0.25 Y_sync - rep(NA, length(Tx)) for (i in 1:length(Tx)) { T_min - Tx[i] - w T_max - Tx[i] + w Y_sync[i] - mean(Y[Ty = T_min Ty = T_max ]) } diff = X - Y_sync print(summary(diff)) print(summary(lm(Y_sync~X))) plot (diff~Tx, type=l) This isn't substantially different than what you have but does replace the explicit loop and associated indexing with an implicit loop: sapply(Tx, function(tx) mean(Y[Ty = tx-w Ty = tx+w])) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
You posted the exact same question several days ago (June 17) under a different name. You got two perfectly good and adequate answers. /Berend -- View this message in context: http://r.789695.n4.nabble.com/no-subject-tp2268375p2268685.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
You still are stating the effect of the central limit theorem incorrectly. Please see my previous note. Frank On 06/25/2010 10:27 AM, Joris Meys wrote: 2010/6/25 Frank E Harrell Jrf.harr...@vanderbilt.edu: The central limit theorem doesn't help. It just addresses type I error, not power. Frank I don't think I stated otherwise. I am aware of the fact that the wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared to the t-test in case of skewed distributions. Apologies if I caused more confusion. The problem with the wilcoxon is twofold as far as I understood this data correctly : - there are quite some ties - the wilcoxon assumes under the null that the distributions are the same, not only the location. The influence of unequal variances and/or shapes of the distribution is enhanced in the case of unequal sample sizes. The central limit theory makes that : - the t-test will do correct inference in the presence of ties - unequal variances can be taken into account using the modified t-test, both in the case of equal and unequal sample sizes For these reasons, I would personally use the t-test for comparing two samples from the described population. Your mileage may vary. Cheers Joris On 06/25/2010 04:29 AM, Joris Meys wrote: As a remark on your histogram : use less breaks! This histogram tells you nothing. An interesting function is ?density , eg : x-rnorm(250) hist(x,freq=F) lines(density(x),col=red) See also this ppt, a very nice and short introduction to graphics in R : http://csg.sph.umich.edu/docs/R/graphics-1.pdf 2010/6/25 Atte Tenkanenatte...@utu.fi: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 You should think about the central limit theorem. Actually, you can just use a t-test to compare means, as with those sample sizes the mean is almost certainly normally distributed. i would like to test, whether the mean of the sample differ significantly from the population mean. According to probability theory, this will be in 5% of the cases if you repeat your sampling infinitly. But as David asked: why on earth do you want to test that? cheers Joris -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Handouts / Reports or just simply printing text to PDF?
Hi Ralf, ?pdf and ?png are good places to start. There is also R2wd: http://cran.r-project.org/web/packages/R2wd/index.html For exporting R output to word. I wrote a short tutorial session for it here: http://www.r-statistics.com/2010/05/exporting-r-output-to-ms-word-with-r2wd-an-example-session/ Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Fri, Jun 25, 2010 at 5:35 AM, Ralf B ralf.bie...@gmail.com wrote: I assume R won't easily generate nice reports (unless one starts using Sweave and LaTeX) but perhaps somebody here knows a package that can create report like output for special cases? How can I simply plot output into PDF? Perhaps you know a package I should check out? What do you guys do to create handouts (before actually publishing)? Thanks in advance, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: deterministic position_jitter geom_line with position_jitter
I'm having the same problem as Stephan (see below), but what I'm trying to jitter is not a numeric vector, but a factor. How do I proceed? (Naively jittering a factor makes it numeric, no longer factor, so I don't get the custom ordering which conveniently comes with using a factor. I'm not sure how I would simulate that custom ordering with the jittered vector ... I couldn't find anything online about jittering factors, but maybe I just wasn't searching cleverly enough.) You'll probably need to reorder the factor, then jitter it, and then add custom labels with scale_continuous(). I think I see how to resolve this problem in general (two displays of the same jittered data), but it requires basically a complete rewrite of ggplot2, so it's unlikely to appear before ggplot3. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] variograms and kriging
Hello Trying to develop variograms and kriged surfaces from a point file. Here is what I've done so far. library(gstat) # also loads library(sp) library(lattice) soilpts$x - soilpts$UTM_X soilpts$y - soilpts$UTM_Y soil.dat - subset(soilpts, select=c(x, y, Area, BulkDensity, LOI, TP, TN, TC, Total_Mg)) dim(soil.dat) [1] 12927 coordinates(soil.dat) - ~ x+y gridded(soil.dat) - TRUE Warning messages: 1: In points2grid(points, tolerance, round, fuzz.tol) : grid has empty column/rows in dimension 1 2: In points2grid(points, tolerance, round, fuzz.tol) : grid has empty column/rows in dimension 2 class(soil.dat) [1] SpatialPixelsDataFrame attr(,package) [1] sp bbox(soil.dat) min max x 476819 575981 y 2785749 2948128 soil.dat[1:3,] suggested tolerance minimum: 0.165318957771788 Error in points2grid(points, tolerance, round, fuzz.tol) : dimension 1 : coordinate intervals are not constant The last error message and the warning returned above, leads me to think that the spatial sampling locations must be regular equally spaced. My data thou is not I have spent the morning trying to figure this out - going back and forth among many spatial packages that can do variograms and krigging. Without a good road map to follow however, I've had to a number of about faces. Not sure which way to turn now. Can anyone provide guidance? Using Windows WP and R 2.11.1 packages updated today. Thanks Steve Steve Friedman Ph. D. Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 steve_fried...@nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?
There is a freely downloadable and very relevant ( readable) book at https://ccrma.stanford.edu/~dattorro/mybook.html Convex Optimization and Euclidean Distance geometry, and it indeed names EDMA as a form of multidimensional scaling (or maybe in the oposite way). You should have a look at the codes for multidimensional scaling in R. Kjetil On Fri, Jun 25, 2010 at 6:25 AM, gokhanocakoglu ocako...@uludag.edu.tr wrote: thanks for your interests Joris Gokhan OCAKOGLU Uludag University Faculty of Medicine Department of Biostatistics http://www20.uludag.edu.tr/~biostat/ocakoglui.htm -- View this message in context: http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268257.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variograms and kriging
Please disregard. I've posted to the wrong site. Steve Friedman Ph. D. Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 steve_fried...@nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147 steve_fried...@np s.gov Sent by: To r-help-boun...@r- r-help@r-project.org project.orgcc Subject 06/25/2010 01:38 [R] variograms and kriging PM Hello Trying to develop variograms and kriged surfaces from a point file. Here is what I've done so far. library(gstat) # also loads library(sp) library(lattice) soilpts$x - soilpts$UTM_X soilpts$y - soilpts$UTM_Y soil.dat - subset(soilpts, select=c(x, y, Area, BulkDensity, LOI, TP, TN, TC, Total_Mg)) dim(soil.dat) [1] 12927 coordinates(soil.dat) - ~ x+y gridded(soil.dat) - TRUE Warning messages: 1: In points2grid(points, tolerance, round, fuzz.tol) : grid has empty column/rows in dimension 1 2: In points2grid(points, tolerance, round, fuzz.tol) : grid has empty column/rows in dimension 2 class(soil.dat) [1] SpatialPixelsDataFrame attr(,package) [1] sp bbox(soil.dat) min max x 476819 575981 y 2785749 2948128 soil.dat[1:3,] suggested tolerance minimum: 0.165318957771788 Error in points2grid(points, tolerance, round, fuzz.tol) : dimension 1 : coordinate intervals are not constant The last error message and the warning returned above, leads me to think that the spatial sampling locations must be regular equally spaced. My data thou is not I have spent the morning trying to figure this out - going back and forth among many spatial packages that can do variograms and krigging. Without a good road map to follow however, I've had to a number of about faces. Not sure which way to turn now. Can anyone provide guidance? Using Windows WP and R 2.11.1 packages updated today. Thanks Steve Steve Friedman Ph. D. Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 steve_fried...@nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave: The opposite of tangle
Thanks! That was exactly what I was looking for. Best, Stefan On Fri, Jun 25, 2010 at 12:37 PM, Kevin E. Thorpe kevin.tho...@utoronto.ca wrote: Kevin E. Thorpe wrote: stefan.d...@gmail.com wrote: Hi, I am using Sweave to write an article. If I want to convert the *.rnw to a *.tex file I have to run Sweave which might take a long time. Is there away to get a tex-file as result without (evaluating) the R-chunks, i.e. the opposite of tangle (that just gives R-chunk). Thanks, Stefan This is untested, but does Sweave(file.rnw, eval=FASLE) do what you want? That should be FALSE above. Don't post before coffee. h -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.tho...@utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Forcing scalar multiplication.
I am trying to check the results from an Eigen decomposition and I need to force a scalar multiplication. The fundamental equation is: Ax = lx. Where 'l' is the eigen value and x is the eigen vector corresponding to the eigenvalue. 'R' returns the eigenvalues as a vector (e - eigen(A); e$values). So in order to 'check' the result I would multiply the eigenvalues ('l') by the eigenvectors. But unless I do it one by one (say e$values[1] * e$vectors[,1]) 'R' tries a matrix multiplication and that is not what I want. I would like a matrix that is formed by the SCALAR multiplication of each of the values by the corresponding eigenvector. How can I force such a multiplication? Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Forcing scalar multiplication.
I am trying to check the results from an Eigen decomposition and I need to force a scalar multiplication. The fundamental equation is: Ax = lx. Where 'l' is the eigen value and x is the eigen vector corresponding to the eigenvalue. 'R' returns the eigenvalues as a vector (e - eigen(A); e$values). So in order to 'check' the result I would multiply the eigenvalues ('l') by the eigenvectors. But unless I do it one by one (say e$values[1] * e$vectors[,1]) 'R' tries a matrix multiplication and that is not what I want. I would like a matrix that is formed by the SCALAR multiplication of each of the values by the corresponding eigenvector. How can I force such a multiplication? Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
Atte, I would not wonder if you got lost and confused by the certainly interesting methodological discussion that has been going on in this thread. Since the helpers do not seem to converge/agree, I propose to you to use a different nonparametric approach: The bootstrap. The important thing about the bootstrap is that you do not have to be concerned with the questions that have been discussed in this thread. In the bootstrap you draw repeatedly samples with replacement from your data and compute the statistic you are interested in (for you this is the mean). The beauty of this approach is i) that the bootstrap distribution is normal and ii) that you can directly compare the quantiles/confidence intervals of the bootstrap distribution. Let's say you have x and y, which both come from Poisson distributions with relatively low means. Note that this resembles your data in that the distributions are asymmetric, but contain a considerable number of ties. #set seed for random number generation set.seed(123) #simulate x and y (these would be your data) x=rpois(100,3) y=rpois(100,4) #plot histograms for x and y par(mfcol=c(1,2)) hist(x,breaks=length(unique(x))) hist(y,breaks=length(unique(y))) Now we sample with replacement from x and y (i.e., we draw one observation from x and one from y, and afterwards we put the drawn observation back into x and y, respectively). For each bootstrap of x and y, respectively, we sample exactly as many observations as there are in x and y, respectively (here 100). We then compute the statistic of interest of this bootstrap (here the mean). We repeat this process many times (here 1000). n=1000 #number of bootstraps to draw x.boot1=numeric(n) y.boot1=numeric(n) for(i in 1:1000){ x.boot1[i]=mean(sample(x,length(x),replace=T)) y.boot1[i]=mean(sample(y,length(y),replace=T)) } Doing this, we draw the bootstrap distribution of the mean of x and y, respectively. Note that the bootstrap distribution is normally distributed and unbiased (the latter automatically because we bootstrap the mean): par(mfcol=c(1,2)) hist(x.boot1) hist(y.boot1) The simple(st) way of comparing these distributions is by checking whether their confidence intervals overlap or not. You get the 95-percent confidence intervals by quantile(x.boot1,p=c(0.025,0.975)) quantile(y.boot1,p=c(0.025,0.975)) If they do not overlap, you would conclude that they are significantly different. In the one-sample case, you would just compare whether value of interest is within or outside the confidence interval. Finally, note that the little loop that we have programmed to draw the bootstraps are already implemented in an R package. Using the bootstrap package, you could draw the bootstraps analogously by: library(bootstrap) x.boot2=bootstrap(x,nboot=1000,mean) y.boot2=bootstrap(y,nboot=1000,mean) The bootstrapped means are then stored in x.boot2$thetastar and y.boot2$thetastar. Hope that helps, Daniel This process we repeatAnd now we draw many bootstraps, r -- View this message in context: http://r.789695.n4.nabble.com/Wilcoxon-signed-rank-test-and-its-requirements-tp2266165p2268801.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trying to tile wireframe plots (using lattice package)
Hi all, I'm trying to print a number of wireframe plots (generated using the lattice package), and I want them to appear in a two-by two matrix along with some other (standard) plots. In other words I am trying to create a subplot or tiled plot that works for wireframes. I've tried the methods discussed in: http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21238.html but while they work for hist(), they don't work for wireframe(). I've also tried split.screen() and layout() - see below: ## Example of what I'm trying to do library(lattice) layout(matrix(c(1,2,3,4), 2, 2, byrow = TRUE)) # Top-left, as expected plot(rnorm(100),rnorm(100)) # Top-right, as expected plot(rnorm(100),rnorm(100)) # But the volcano fills the whole the device ... wireframe(volcano) ## End of example All has been to no avail up until now. I'd be grateful for any suggestions you may have. Best, Magnus ps. If there is a way to do this using intermediate files (saving each plot as a PS file, and then tiling multiple PS files within the same device), that would be a totally acceptable solution for me as well. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
Let me see if I understand. You actually have the data for the whole population (the entire piece) but you have some pre-defined sections that you want to see if they differ from the population, or more meaningfully they are different from a randomly selected set of measures. Is that correct? If so, since you have the entire population of interest you can create the actual sampling distribution (or a good approximation of it). Just take random samples from the population of the given size (matching the subset you are interested in) and calculate the means (or other value of interest), probably 10,000 to 1,000,000 samples. Now compare the value from your predefined subset to the set of random values you generated to see if it is in the tail or not. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Atte Tenkanen Sent: Thursday, June 24, 2010 11:04 PM To: David Winsemius Cc: R mailing list Subject: Re: [R] Wilcoxon signed rank test and its requirements The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 I do not understand why there should be so many ties. You have not described the measurement process or units. ( ... although you offer a glipmse without much background later.) i would like to test, whether the mean of the sample differ significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of a sample be that important? The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.png That picture does not offer much insidght into the features of that measurement. It appears to have much more structure than I would expect for a sample from a smooth unimodal underlying population. -- David. Atte On 06/24/2010 12:40 PM, David Winsemius wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj 0. So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive. [If there are ties this probably needs to be worded as P[xi + xj 0] = P[xi + xj 0], i neq j. Frank -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trying to tile wireframe plots (using lattice package)
The layout function is base graphics, wireframe from lattice is grid based and they don't play well together without extra effort. The simplest option will probably be to look at the help page for print.trellis, specifically the split and more arguments. Then look at the examples to see if this works for you in place of layout. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Magnus Torfason Sent: Friday, June 25, 2010 12:50 PM To: r-help@r-project.org Subject: [R] Trying to tile wireframe plots (using lattice package) Hi all, I'm trying to print a number of wireframe plots (generated using the lattice package), and I want them to appear in a two-by two matrix along with some other (standard) plots. In other words I am trying to create a subplot or tiled plot that works for wireframes. I've tried the methods discussed in: http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21238.html but while they work for hist(), they don't work for wireframe(). I've also tried split.screen() and layout() - see below: ## Example of what I'm trying to do library(lattice) layout(matrix(c(1,2,3,4), 2, 2, byrow = TRUE)) # Top-left, as expected plot(rnorm(100),rnorm(100)) # Top-right, as expected plot(rnorm(100),rnorm(100)) # But the volcano fills the whole the device ... wireframe(volcano) ## End of example All has been to no avail up until now. I'd be grateful for any suggestions you may have. Best, Magnus ps. If there is a way to do this using intermediate files (saving each plot as a PS file, and then tiling multiple PS files within the same device), that would be a totally acceptable solution for me as well. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Delete rows in the data frame by limiting values in two columns
Hi, folks, Finally Friday~~ Here comes the question: x=c('germany','poor italy','usa','england','poor italy','japan') y=c('Spain','germany','usa','brazil','england','chile') s=1:6 z=3:8 test=data.frame(x,y,s,z) #Now I only concern the countries ('germany','england','brazil'). I would like to keep the rows where these three countries #are involved either in test$x OR test$y. So the result should be like as follows (I did this manually ): xy s z 1germany Spain 1 3 2 poor italy germany 2 4 3england Brazil 4 6 4 poor italy england 5 7 Any codes work for this? Thanks great in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
On Thu, 24 Jun 2010, Atte Tenkanen wrote: On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed- rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. -- David. Thanks. Unfortunately, I can't follow the reference at all, but I read this in that way that I can be carefree as far as the underlying distribution is concerned? Is there any other authoritative reference where that is just stated in a way test does not require that the underlying distributions be symmetric or normal. The statement from GraphPad is correct, but for a different question. Let me expound. First let us consider means: If you have paired samples X1.. Xn and Y1..Yn you could ask if the mean of X is equal to the mean of Y, or if the mean of (X-Y) is zero. These are equivalent questions, because of the way the mean is defined. So the paired t-test, which answers the first question, and the one-sample t-test, which answers the second question, are equivalent. They have no assumptions (other than sufficient sample size for the means to be Normally distributed). Now, let us consider medians. f you have paired samples X1.. Xn and Y1..Yn you could ask if the median of X is equal to the median of Y, or if the median of (X-Y) is zero. The first question can be answered by any standard test (though there are ways to do it). The second is answered by the sign test. They are not at all equivalent: it is possible for the median of X to be larger than the median of Y but the median of (X-Y) to be negative. The non-equivalence is true for essentially all statistics except for the mean. Now, let us consider the Wilcoxon signed-rank test. This can be characterized precisely as a test of the null hypothesis that the median pairwise mean of X-Y is zero. That is, take all n(n-1)/2 pairs of (X-Y)s. Take the mean of each pair to get n(n-1)/2 pairwise means. Take the median of these numbers. The p-value will be 0.5 one-sided or 1.0 two-sided when this median pairwise mean is exactly zero. The median pairwise mean is also sometimes known as the Hodges-Lehmann estimator (though this is strictly speaking a more general term). As David correctly points out, no assumptions are needed for the Wilcoxon signed-rank test to be a test of *this* null hypothesis. The problem is that this may not be the null hypothesis you care about. As GraphPad correctly points out, the P value will not tell you much about whether the *median* is different than the hypothetical value because the median is not the same as the median pairwise mean. It is entirely possible for the median difference to be positive and the median pairwise mean difference to be zero or negative. If you assume that the distribution of differences X-Y is symmetric, then the Wilcoxon signed-rank test also tests the null hypothesis that the median of X-Y is zero (and that the mean of X-Y is zero), because these null hypotheses are equivalent for a symmetric distribution. That's what GraphPad is saying You could also assume that the distributions X and Y are stochastically ordered. This basically implies that the direction of difference is the same no matter what location statistic you use to measure it. If X was before some intervention and Y was afterwards you would basically be assuming that the intervention is either beneficial for everyone or harmful for everyone (up to measurement error). Under this assumption, the signed rank test also tells you reliably about differences in medians. To some extent this is a philosophical issue. My preference is to know exactly what a test is doing and to make these distinctions. Many other people, including reputable experts like Frank Harrell, believe (I think) that simplifying assumptions such as stochastic ordering are a pretty good approximation in a lot of situations, so it isn't necessary to always make these distinctions. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle
Re: [R] Delete rows in the data frame by limiting values in two columns
Try this: test[rowSums(mapply('%in%', test[c('x', 'y')], list(c('germany','england','brazil' 0,] On Fri, Jun 25, 2010 at 4:00 PM, Yi liuyi.fe...@gmail.com wrote: Hi, folks, Finally Friday~~ Here comes the question: x=c('germany','poor italy','usa','england','poor italy','japan') y=c('Spain','germany','usa','brazil','england','chile') s=1:6 z=3:8 test=data.frame(x,y,s,z) #Now I only concern the countries ('germany','england','brazil'). I would like to keep the rows where these three countries #are involved either in test$x OR test$y. So the result should be like as follows (I did this manually ): xy s z 1germany Spain 1 3 2 poor italy germany 2 4 3england Brazil 4 6 4 poor italy england 5 7 Any codes work for this? Thanks great in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Delete rows in the data frame by limiting values in two columns
x=c('germany','poor italy','usa','england','poor italy','japan') y=c('Spain','germany','usa','brazil','england','chile') s=1:6 z=3:8 test=data.frame(x,y,s,z) #Now I only concern the countries ('germany','england','brazil'). I would like to keep the rows where these three countries #are involved either in test$x OR test$y. So the result should be like as follows (I did this manually ): xy s z 1germany Spain 1 3 2 poor italy germany 2 4 3england Brazil 4 6 4 poor italy england 5 7 Any codes work for this? ss - c(germany, england, brazil) subset(test, x %in% ss | y %in% ss) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
I had taken the opposite tack with Google Trends by subtracting keywords like: SAS -shoes -airlines -sonar... but never got as good results as that beautiful X code for search. When you see the end-of-semester panic bumps in traffic, you know you're nailing it! I see that there's a car, the R Code Mustang, that adding for gets rid of. Thanks for getting me back on a topic that I had given up on! Bob -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Joris Meys Sent: Thursday, June 24, 2010 7:56 PM To: Dario Solari Cc: r-help@r-project.org Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... Nice idea, but quite sensitive to search terms, if you compare your result on ... code with ... code for: http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%20 f or%2Cspss%20code%20forcmpt=q On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari dario.sol...@gmail.com wrote: First: excuse for my english My opinion: a useful font for measuring popoularity can be Google Insights for Search - http://www.google.com/insights/search/# Every person using a software like R, SAS, SPSS needs first to learn it. So probably he make a web-search for a manual, a tutorial, a guide. One can measure the share of this kind of serach query. This kind of results can be useful to determine trends of popularity. Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide, SPSS tutorial/manual/guide http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20ma n ual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22%2 B %22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22sa s %20manual%22%2B%22sas%20guide%22cmpt=q Example 2: R software, SAS software, SPSS software http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss%2 0 software%22%2C%22sas%20software%22cmpt=q Example 3: R code, SAS code, SPSS code http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20cod e %22%2C%22sas%20code%22cmpt=q Example 4: R graph, SAS graph, SPSS graph http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20gr a ph%22%2C%22sas%20graph%22cmpt=q Example 5: R regression, SAS regression, SPSS regression http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22spss % 20regression%22%2C%22sas%20regression%22cmpt=q Some example are cross-software (learning needs - Example1), other can be biased by the tarditional use of that software (in SPSS usually you don't manipulate graph, i think) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lattice legend
The solution Felix suggested worked: It was indeed helpful to include the line par.setttings=list(superpose.symbol=sup.sym) while using.auto key with a customized symbol list in lattice. Thanks Felix! Seth [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Modelling Crystal Growth
Dear all, I would like to hear from anyone who has experience using R to simulate and visualise the formation and growth of crystals. Thank you. mpl -- View this message in context: http://r.789695.n4.nabble.com/Modelling-Crystal-Growth-tp2268746p2268746.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Forcing scalar multiplication.
?sweep On Fri, Jun 25, 2010 at 2:43 PM, rkevinbur...@charter.net wrote: I am trying to check the results from an Eigen decomposition and I need to force a scalar multiplication. The fundamental equation is: Ax = lx. Where 'l' is the eigen value and x is the eigen vector corresponding to the eigenvalue. 'R' returns the eigenvalues as a vector (e - eigen(A); e$values). So in order to 'check' the result I would multiply the eigenvalues ('l') by the eigenvectors. But unless I do it one by one (say e$values[1] * e$vectors[,1]) 'R' tries a matrix multiplication and that is not what I want. I would like a matrix that is formed by the SCALAR multiplication of each of the values by the corresponding eigenvector. How can I force such a multiplication? Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Muenchen, Robert A (Bob) Sent: Friday, June 25, 2010 3:08 PM To: Joris Meys; Dario Solari Cc: r-help@r-project.org Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... I had taken the opposite tack with Google Trends by subtracting keywords like: SAS -shoes -airlines -sonar... but never got as good results as that beautiful X code for search. When you see the end-of-semester panic bumps in traffic, you know you're nailing it! I have to eat those words already. The R code for search that showed a peak every December did not have quotes around it, so it was searching for those three words not the complete phrase. When you add the quotes, the peaks vanish. Once you go the phrase route, you gain precision but end up with zero counts on various phrases. I avoided that by combining them with + to get enough to plot. The resulting graph shows SAS dominant until mid-2006 when SPSS takes the top position, followed by R, SAS, Stata in order: http://www.google.com/insights/search/#q=%22r%20code%20for%22%2B%22r%20m anual%22%2B%22r%20tutorial%22%2B%22r%20graph%22%2C%22sas%20code%20for%22 %2B%22sas%20manual%22%2B%22sas%20tutorial%22%2B%22sas%20graph%22%2C%22sp ss%20code%20for%22%2B%22spss%20manual%22%2B%22spss%20tutorial%22%2B%22sp ss%20graph%22%2C%22stata%20code%20for%22%2B%22stata%20manual%22%2B%22sta ta%20tutorial%22%2B%22stata%20graph%22%2C%22s-plus%20code%20for%22%2B%22 s-plus%20manual%22%2Bs-plus%20tutorial%22%2B%22s-plus%20graph%22cmpt=q This might be a good one to add to http://r4stats.com/popularity Bob I see that there's a car, the R Code Mustang, that adding for gets rid of. Thanks for getting me back on a topic that I had given up on! Bob -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Joris Meys Sent: Thursday, June 24, 2010 7:56 PM To: Dario Solari Cc: r-help@r-project.org Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... Nice idea, but quite sensitive to search terms, if you compare your result on ... code with ... code for: http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%2 0 f or%2Cspss%20code%20forcmpt=q On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari dario.sol...@gmail.com wrote: First: excuse for my english My opinion: a useful font for measuring popoularity can be Google Insights for Search - http://www.google.com/insights/search/# Every person using a software like R, SAS, SPSS needs first to learn it. So probably he make a web-search for a manual, a tutorial, a guide. One can measure the share of this kind of serach query. This kind of results can be useful to determine trends of popularity. Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide, SPSS tutorial/manual/guide http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20m a n ual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22% 2 B %22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22s a s %20manual%22%2B%22sas%20guide%22cmpt=q Example 2: R software, SAS software, SPSS software http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss% 2 0 software%22%2C%22sas%20software%22cmpt=q Example 3: R code, SAS code, SPSS code http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20co d e %22%2C%22sas%20code%22cmpt=q Example 4: R graph, SAS graph, SPSS graph http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20g r a ph%22%2C%22sas%20graph%22cmpt=q Example 5: R regression, SAS regression, SPSS regression http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22sps s % 20regression%22%2C%22sas%20regression%22cmpt=q Some example are cross-software (learning needs - Example1), other can be biased by the tarditional use of that software (in SPSS usually you don't manipulate graph, i think) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-
Re: [R] exists() and functions
Always nice to answer my own question 3 minutes later. The missing() function does what I want. Still, why DOES this exists() statement fail? Do functions auto create the variables once they are called, regardless of whether or not they are assigned? --j On Fri, Jun 25, 2010 at 1:05 PM, Jonathan Greenberg greenb...@ucdavis.edu wrote: I'm a bit confused about how exists() work within a function -- I want to test for unassigned variables, but I'm doing tests in the main environment to figure out the function, so the variables DO exist in the parent environment of a function call. Why does: myfunction - function(variable_outside_function) { print(exists(variable_outside_function,inherit=FALSE)) print(exists(another_variable_outside_function,inherit=FALSE)) } myfunction() Return: [1] TRUE [1] FALSE I didn't assign anything to variable_outside_function, so I'm unclear why it thinks it exists... --j __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] exists() and functions
I'm a bit confused about how exists() work within a function -- I want to test for unassigned variables, but I'm doing tests in the main environment to figure out the function, so the variables DO exist in the parent environment of a function call. Why does: myfunction - function(variable_outside_function) { print(exists(variable_outside_function,inherit=FALSE)) print(exists(another_variable_outside_function,inherit=FALSE)) } myfunction() Return: [1] TRUE [1] FALSE I didn't assign anything to variable_outside_function, so I'm unclear why it thinks it exists... --j __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Export Results
Hi R users, How can I automatically export results and graphs to a file? Thanks in advance Pedro Mota Veiga -- View this message in context: http://r.789695.n4.nabble.com/Export-Results-tp2268622p2268622.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice plotting question
Hi all, I'm working on some plots using lattice (R 2.10.1), and have entered the polish phase. I've produced a satisfactory pair of xyplots ( http://imgur.com/EyXGi.png), but would like to align the y-axes of the top and bottom plots. I assume that I need to adjust axis padding or something, but I can't figure this one out. Thanks for any help! Dave -- Post-doctoral Fellow Neurology Department University of Iowa Hospitals and Clinics davideugenewar...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Diebold Mariano
Hello, I am trying to calculate Diebold Mariano test statistic (DM) using dm.test module. I also try to do the same thing with STATA and I get vastly different results (4.5 vs 25). Does someone have experience with this module? I tried to calculate the DM statistic manually. If by “d” I define the difference of squared forecast errors for two models, then DM=mean(d)/sqrt(long_run_var(d)). To calculate long run variance of d I use newey west standard errors. What I don’t understand in newey west command is meaning of “prewhite”. Any help? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Export Results
See ?Sweave On Fri, Jun 25, 2010 at 12:58 PM, Pedro Mota Veiga motave...@net.sapo.ptwrote: Hi R users, How can I automatically export results and graphs to a file? Thanks in advance Pedro Mota Veiga -- View this message in context: http://r.789695.n4.nabble.com/Export-Results-tp2268622p2268622.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trying to tile wireframe plots (using lattice package)
Thanks, that was the pointer I needed. I'd tried the split parameter but didn't realize that it doesn't work well within wireframe() itself, rather, I had to call print.trellis() directly using the trellis object that wireframe() returns if one assigns it to something. After that, it was pretty straightforward. One issue I found surprising was that you must pass more=TRUE to the call _before_ you want to add more, rather than adding it to the call that is actually supposed to draw onto a pre-existing canvas. But that was a quick fix. Here is code that worked for me. ## Example begins top.left = wireframe(volcano) top.right= wireframe(volcano, shade = TRUE) bottom.left = wireframe(volcano, shade = TRUE, aspect = c(61/87, 0.4), ) bottom.right = wireframe(volcano, shade = TRUE, aspect = c(61/87, 0.4), light.source = c(10,0,10)) print(top.left , split=c(1,1,2,2) , more=TRUE ) print(top.right, split=c(2,1,2,2) , more=TRUE ) print(bottom.left , split=c(1,2,2,2) , more=TRUE) print(bottom.right , split=c(2,2,2,2) ) ## Example ends Thanks again! Magnus On 6/25/2010 2:59 PM, Greg Snow wrote: The layout function is base graphics, wireframe from lattice is grid based and they don't play well together without extra effort. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fatal error: unable to restore saved data
I just installed the R 2.11.1 version on my computer and I encountered a fatal error: Unable to restore saved data in .RData and kick me out of R right away. I still can run 2.10.2. There is no package called rattle I checked various posts regarding this error. I still can't get it to work. I removed two files that had .rdata extension and still does not work. Any suggestion? Please advise. How do you check your current working directory? Albert Confidentiality Notice: This communication, and any file...{{dropped:12}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fatal error: unable to restore saved data
Albert - The message refers to a file specifically called .RData. Files with subscripts of .rdata are not related. You can see your current working directory by typing getwd() at the R prompt. I'm not sure where rattle enters into the picture. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Fri, 25 Jun 2010, Albert Lee, Ph.D. wrote: I just installed the R 2.11.1 version on my computer and I encountered a fatal error: Unable to restore saved data in .RData and kick me out of R right away. I still can run 2.10.2. There is no package called rattle I checked various posts regarding this error. I still can't get it to work. I removed two files that had .rdata extension and still does not work. Any suggestion? Please advise. How do you check your current working directory? Albert Confidentiality Notice: This communication, and any file...{{dropped:12}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Average 2 Columns when possible, or return available value
Forum, Using the following data: DF-read.table(textConnection(A B 22.60 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 102.00 NA 19.20 NA 19.20 NA NA NA NA NA NA NA 11.80 NA 7.62 NA NA NA NA NA NA NA NA NA NA NA 75.00 NA NA NA 18.30 18.2 NA NA NA NA 8.44 NA 18.00 NA NA NA 12.90 NA),header=T) closeAllConnections() The second column is a duplicate reading of the first column, and when two values are available, I would like to average column 1 and 2 (example code below). But if there is only one reading, I would like to retain it, but I haven't found a good way to exclude NA's using the following code: t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1])) Currently, row 24 is the only row with a returned value. I'd like the result to return column A if it is the only available value, and average where possible. Of course, if both columns are NA, NA is the only possible result. The result I'm after would look like this (row 24 is an avg): 22.60 NA NA NA NA NA NA NA 102.00 19.20 19.20 NA NA NA 11.80 7.62 NA NA NA NA NA 75.00 NA 18.25 NA NA 8.44 18.00 NA 12.90 This is a small example from a much larger data frame, so if you're wondering what the deal is with list(), that will come into play for the larger problem I'm trying to solve. Respectfully, Eric -- View this message in context: http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Average 2 Columns when possible, or return available value
Eric - What you're describing is taking the mean of each row while ignoring missing values: apply(DF,1,mean,na.rm=TRUE) [1] 22.60NaNNaNNaNNaNNaNNaNNaN 102.00 19.20 [11] 19.20NaNNaNNaN 11.80 7.62NaNNaNNaNNaN [21]NaN 75.00NaN 18.00NaN 12.90 If this isn't suitable for your larger problem, please describe that problem in greater detail. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Fri, 25 Jun 2010, emorway wrote: Forum, Using the following data: DF-read.table(textConnection(A B 22.60 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 102.00 NA 19.20 NA 19.20 NA NA NA NA NA NA NA 11.80 NA 7.62 NA NA NA NA NA NA NA NA NA NA NA 75.00 NA NA NA 18.30 18.2 NA NA NA NA 8.44 NA 18.00 NA NA NA 12.90 NA),header=T) closeAllConnections() The second column is a duplicate reading of the first column, and when two values are available, I would like to average column 1 and 2 (example code below). But if there is only one reading, I would like to retain it, but I haven't found a good way to exclude NA's using the following code: t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1])) Currently, row 24 is the only row with a returned value. I'd like the result to return column A if it is the only available value, and average where possible. Of course, if both columns are NA, NA is the only possible result. The result I'm after would look like this (row 24 is an avg): 22.60 NA NA NA NA NA NA NA 102.00 19.20 19.20 NA NA NA 11.80 7.62 NA NA NA NA NA 75.00 NA 18.25 NA NA 8.44 18.00 NA 12.90 This is a small example from a much larger data frame, so if you're wondering what the deal is with list(), that will come into play for the larger problem I'm trying to solve. Respectfully, Eric -- View this message in context: http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Average 2 Columns when possible, or return available value
Hello Eric, I am not sure how your need to use list() will fit in with this, but for your sample data, this will do the trick. matrix(rowMeans(DF, na.rm=TRUE), ncol=1) HTH, Josh On Fri, Jun 25, 2010 at 4:08 PM, emorway emor...@engr.colostate.edu wrote: Forum, Using the following data: DF-read.table(textConnection(A B 22.60 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 102.00 NA 19.20 NA 19.20 NA NA NA NA NA NA NA 11.80 NA 7.62 NA NA NA NA NA NA NA NA NA NA NA 75.00 NA NA NA 18.30 18.2 NA NA NA NA 8.44 NA 18.00 NA NA NA 12.90 NA),header=T) closeAllConnections() The second column is a duplicate reading of the first column, and when two values are available, I would like to average column 1 and 2 (example code below). But if there is only one reading, I would like to retain it, but I haven't found a good way to exclude NA's using the following code: t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1])) Currently, row 24 is the only row with a returned value. I'd like the result to return column A if it is the only available value, and average where possible. Of course, if both columns are NA, NA is the only possible result. The result I'm after would look like this (row 24 is an avg): 22.60 NA NA NA NA NA NA NA 102.00 19.20 19.20 NA NA NA 11.80 7.62 NA NA NA NA NA 75.00 NA 18.25 NA NA 8.44 18.00 NA 12.90 This is a small example from a much larger data frame, so if you're wondering what the deal is with list(), that will come into play for the larger problem I'm trying to solve. Respectfully, Eric -- View this message in context: http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Average 2 Columns when possible, or return available value
btw, if you just wanted your exact code to work: t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, na.rm=TRUE)[,-1])) You will get NaNs rather than NAs where you are missing from both rows, but that should not be a real issue. snip -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?
Thanks for the link, very interesting book. Yet, I couldn't find the part about EDMA. It would have surprised me anyway, as the input of multidimensional scaling is one matrix with euclidean distances between your observations, whereas in EDMA the data consist of a number of distance matrices. Quite a different thing if you ask me. Neither cmdscale nor isoMDS or its derivated functions (eg metaMDS in the vegan package) are going to be of any help. Now I come to think of it, vegan has a procrustes function, but I'm not sure if it is generalized to be of use in EDMA. Cheers Joris On Fri, Jun 25, 2010 at 7:42 PM, Kjetil Halvorsen kjetilbrinchmannhalvor...@gmail.com wrote: There is a freely downloadable and very relevant ( readable) book at https://ccrma.stanford.edu/~dattorro/mybook.html Convex Optimization and Euclidean Distance geometry, and it indeed names EDMA as a form of multidimensional scaling (or maybe in the oposite way). You should have a look at the codes for multidimensional scaling in R. Kjetil On Fri, Jun 25, 2010 at 6:25 AM, gokhanocakoglu ocako...@uludag.edu.tr wrote: thanks for your interests Joris Gokhan OCAKOGLU Uludag University Faculty of Medicine Department of Biostatistics http://www20.uludag.edu.tr/~biostat/ocakoglui.htm -- View this message in context: http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268257.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice plotting question
ylim = extendrange(c(0,100)) ? On 26 June 2010 01:42, David Warren davideugenewar...@gmail.com wrote: Hi all, I'm working on some plots using lattice (R 2.10.1), and have entered the polish phase. I've produced a satisfactory pair of xyplots ( http://imgur.com/EyXGi.png), but would like to align the y-axes of the top and bottom plots. I assume that I need to adjust axis padding or something, but I can't figure this one out. Thanks for any help! Dave -- Post-doctoral Fellow Neurology Department University of Iowa Hospitals and Clinics davideugenewar...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Felix Andrews / 安福立 Integrated Catchment Assessment and Management (iCAM) Centre Fenner School of Environment and Society [Bldg 48a] The Australian National University Canberra ACT 0200 Australia M: +61 410 400 963 T: + 61 2 6125 4670 E: felix.andr...@anu.edu.au CRICOS Provider No. 00120C -- http://www.neurofractal.org/felix/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] use a data frame whose name is stored as a string variable?
Hi, Let's say I have a data frame (called example) with numeric values stored (columns V1 and V2). I also have a string variable storing this name x1-example Is there a way to use the variable x so that R knows that I want the specified action to occur on the data frame? For example, summary (x) would return a summary of the data frame? I am considering this because I need to compare many data frames within 2 nested for loops. In the first iteration of the loop I could concatenate x and 1 and then use it to represent the data frame. I'm open to a better solution. Thanks, Seth Myers -- View this message in context: http://r.789695.n4.nabble.com/use-a-data-frame-whose-name-is-stored-as-a-string-variable-tp2269095p2269095.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Label Values in levelplot
I am trying to add labels equal to the value in a levelplot. I believe that panel may be the way to go but cannot understand the examples. In the following example: X,Y,Z A,M,100 A,M,200 B,N,150 B,N,225 I would like to label each of the rectangles 100,200,150 and 225 and colour according to the value The colouring is achieved by levelplot(z ~ x *y , data) but then I get stuck with the labels Thanks very much for your help Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use a data frame whose name is stored as a string variable?
On Fri, Jun 25, 2010 at 5:10 PM, Seth sjmy...@syr.edu wrote: Hi, Let's say I have a data frame (called example) with numeric values stored (columns V1 and V2). I also have a string variable storing this name x1-example Is there a way to use the variable x so that R knows that I want the specified action to occur on the data frame? For example, summary (x) would return a summary of the data frame? ?get For example: get(x) # one object mget(x, envir=.GlobalEnv) # for multiple objects ## just change the environment if that is not where they are located I am considering this because I need to compare many data frames within 2 nested for loops. In the first iteration of the loop I could concatenate x and 1 and then use it to represent the data frame. I'm open to a better solution. Thanks, Seth Myers It is hard to give a better solution without the rest of your code, but there often are cleaner ways than for loops. One solution that avoids the character vector is to put the data frames together in list. Best regards, Josh -- View this message in context: http://r.789695.n4.nabble.com/use-a-data-frame-whose-name-is-stored-as-a-string-variable-tp2269095p2269095.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Average 2 Columns when possible, or return available value
Just want to add that if you want to clean out the NA rows in a matrix or data frame, take a look at ?complete.cases. Can be handy to use with big datasets. I got curious, so I just ran the codes given here on a big dataset, before and after removing NA rows. I have to be honest, this is surely an illustration of the power of rowMeans. I'm amazed myself. DF - data.frame( A=rep(DF$A,1), B=rep(DF$B,1) ) system.time(apply(DF,1,mean,na.rm=TRUE)) user system elapsed 13.260.06 13.46 system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1)) user system elapsed 0.030.000.03 system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, + na.rm=TRUE)[,-1])) + ) Timing stopped at: 227.84 1.03 249.31 -- I got impatient and pressed the escape DF - DF[complete.cases(DF),] system.time(apply(DF,1,mean,na.rm=TRUE)) user system elapsed 0.390.000.39 system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1)) user system elapsed 0.010.000.02 system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, + na.rm=TRUE)[,-1])) + ) user system elapsed 10.010.07 13.40 Cheers Joris On Sat, Jun 26, 2010 at 1:08 AM, emorway emor...@engr.colostate.edu wrote: Forum, Using the following data: DF-read.table(textConnection(A B 22.60 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 102.00 NA 19.20 NA 19.20 NA NA NA NA NA NA NA 11.80 NA 7.62 NA NA NA NA NA NA NA NA NA NA NA 75.00 NA NA NA 18.30 18.2 NA NA NA NA 8.44 NA 18.00 NA NA NA 12.90 NA),header=T) closeAllConnections() The second column is a duplicate reading of the first column, and when two values are available, I would like to average column 1 and 2 (example code below). But if there is only one reading, I would like to retain it, but I haven't found a good way to exclude NA's using the following code: t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1])) Currently, row 24 is the only row with a returned value. I'd like the result to return column A if it is the only available value, and average where possible. Of course, if both columns are NA, NA is the only possible result. The result I'm after would look like this (row 24 is an avg): 22.60 NA NA NA NA NA NA NA 102.00 19.20 19.20 NA NA NA 11.80 7.62 NA NA NA NA NA 75.00 NA 18.25 NA NA 8.44 18.00 NA 12.90 This is a small example from a much larger data frame, so if you're wondering what the deal is with list(), that will come into play for the larger problem I'm trying to solve. Respectfully, Eric -- View this message in context: http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] All a column to a data frame with a specific condition
Hi, folks, Please first look at the codes: plan_a=c('apple','orange','apple','apple','pear','bread') plan_b=c('bread','bread','orange','bread','bread','yogurt') value=1:6 data=data.frame(plan_a,plan_b,value) library(plyr) library(reshape) mm=melt(data, id=c('plan_a','plan_b')) sum_plan_a=cast(mm,plan_a~variable,sum) ### I would like to add a new column to the data.frame named 'data', with the same sum of value for the same type of plan_a ### The result should come up like this: plan_a plan_b value sum_plan_a 1 apple bread 18 2 orange bread 22 3 apple orange 38 4 apple bread 48 5 pear bread 5 5 6 bread yogurt 6 6 Any tips? Thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] predict newdata question
Hi: I am using a subset of the below dataset to predict PRED_SUIT for the whole dataset but I am having trouble with 'newdata'. The model was created with 153 records and want to predict for 208 records. wolf2 - structure(list(gridcell = c(367L, 444L, 533L, 587L, 598L, 609L, 620L, 629L, 641L, 651L, 662L, 674L, 684L, 695L, 738L, 748L, 804L, 805L, 872L, 919L, 929L, 938L, 950L, 958L, 966L, 975L, 976L, 985L, 994L, 1006L, 1015L, 1019L, 1022L, 1025L, 1027L, 1028L, 1029L, 1032L, 1040L, 1043L, 1050L, 1053L, 1061L, 1070L, 1074L, 1078L, 1080L, 1082L, 1083L, 1084L, 1090L, 1095L, 1096L, 1099L, 1106L, 1116L, 1124L, 1125L, 1130L, 1133L, 1134L, 1137L, 1138L, 1139L, 1145L, 1150L, 1151L, 1154L, 1161L, 1162L, 1163L, 1171L, 1175L, 1179L, 1181L, 1184L, 1188L, 1189L, 1193L, 1194L, 1199L, 1204L, 1207L, 1214L, 1222L, 1231L, 1232L, 1241L, 1250L, 1256L, 1275L, 1279L, 378L, 421L, 432L, 480L, 492L, 501L, 511L, 522L, 545L, 555L, 566L, 575L, 705L, 716L, 728L, 760L, 774L, 785L, 794L, 816L, 831L, 841L, 850L, 860L, 861L, 873L, 889L, 899L, 908L, 917L, 931L, 933L, 942L, 944L, 954L, 963L, 971L, 986L, 988L, 996L, 997L, 1007L, 1009L, 1014L, 1041L, 1052L, 1062L, 1064L, 1069L, 1107L, 1108L, 1117L, 1120L, 1172L, 1216L, 1225L, 1239L, 1245L, 1265L, 1287L, 1293L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L), MAJOR_LC = c(42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 51L, 51L, 51L, 42L, 42L, 42L, 71L, 51L, 51L, 51L, 71L, 71L, 51L, 42L, 71L, 42L, 51L, 51L, 42L, 51L, 42L, 51L, 42L, 51L, 51L, 51L, 42L, 51L, 42L, 51L, 71L, 42L, 51L, 42L, 42L, 51L, 51L, 42L, 51L, 42L, 42L, 51L, 51L, 51L, 71L, 51L, 42L, 51L, 42L, 51L, 71L, 42L, 51L, 42L, 42L, 51L, 51L, 42L, 51L, 51L, 71L, 82L, 51L, 42L, 51L, 51L, 42L, 82L, 83L, 51L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 51L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 51L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 71L, 51L, 51L, 51L, 31L, 81L, 41L, 42L, 41L, 42L, 41L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 81L, 81L, 42L, 42L, 42L, 51L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 51L, 42L, 31L, 42L, 81L, 43L, 41L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L), RD_DENSITY = c(1.046, 1.626, 2.356, 1.912, 0.203, 0.049, 0.055, 1.96, 1.515, 0.361, 0.183, 0.022, 1.702, 0.8, 1.356, 0.216, 0.509, 0.915, 0.689, 0.817, 0.93, 0.808, 0.121, 0.026, 0.283, 1.256, 0.56, 0.881, 0.649, 1.074, 0.851, 0.758, 0.375, 0.554, 1.111, 0.783, 1.113, 0.619, 0.587, 0.975, 0.892, 0.162, 0.714, 1.582, 0.408, 0.227, 1.816, 1.586, 0.888, 1.247, 2.016, 0.457, 0.816, 0.933, 0.894, 2.101, 0.091, 2.265, 0.389, 0.343, 1.718, 0.738, 0.597, 1.098, 1.865, 1.082, 0.654, 1.104, 0.43, 0.418, 0.164, 1.068, 0.708, 0.011, 1.61, 1.143, 0.124, 2.039, 0.547, 0.794, 1.694, 0.526, 1.505, 0.861, 0.771, 0.216, 1.018, 2.88, 0.892, 0.741, 0.437, 1.16, 0.966, 0.961, 0.591, 2.052, 0.82, 0.638, 2.107, 3.082, 0.387, 0.716, 1.065, 1.602, 0.93, 0.234, 0.257, 0.186, 0, 0.408, 0.914, 0.281, 0.019, 0.13, 0.704, 0.305, 1.132, 0.347, 0, 0.252, 0.733, 0.925, 0.276, 0.368, 0.596, 0.284, 0.158, 0.627, 0.719, 0.472, 0.264, 0.251, 0.525, 0.231, 0.568, 0.204, 0.44, 0.466, 0.19, 0.134, 0.001, 0.422, 0.2, 0.073, 0.528, 0, 0.42, 0.626, 0.121, 0.181, 1.324, 1.265, 0.827, 11.611, 3.443, 5.382, 2.269, 3.677, 1.1, 4.876, 0.003, 2.86, 2.375, 1.885, 0.044, 0.728, 1.314, 3.042, 0.469, 0.248, 0.675, 1.91, 0.228, 4.058, 3.563, 0.801, 3.421, 0.515, 1.945, 1.235, 1.999, 2.495, 1.193, 1.896, 1.689, 1.144, 1.028, 0.858, 1.703, 4.009, 0.096, 1.85, 0.081, 0, 1.759, 5.549, 4.99, 4.267, 1.792, 0.204, 2.144, 0.212, 9.263, 1.615, 3.502, 1.927, 1.665, 2.17), WOLVES_99 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), WOLVES_01 = c(0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L,
Re: [R] Average 2 Columns when possible, or return available value
On Fri, Jun 25, 2010 at 5:24 PM, Joris Meys jorism...@gmail.com wrote: Just want to add that if you want to clean out the NA rows in a matrix or data frame, take a look at ?complete.cases. Can be handy to use with big datasets. I got curious, so I just ran the codes given here on a big dataset, before and after removing NA rows. I have to be honest, this is surely an illustration of the power of rowMeans. I'm amazed myself. I was too...the documentation (?rowMeans) wasn't joking: These functions are equivalent to use of 'apply' with 'FUN = mean' or 'FUN = sum' with appropriate margins, but are a lot faster. DF - data.frame( A=rep(DF$A,1), B=rep(DF$B,1) ) system.time(apply(DF,1,mean,na.rm=TRUE)) user system elapsed 13.26 0.06 13.46 system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1)) user system elapsed 0.03 0.00 0.03 system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, + na.rm=TRUE)[,-1])) + ) Timing stopped at: 227.84 1.03 249.31 -- I got impatient and pressed the escape DF - DF[complete.cases(DF),] system.time(apply(DF,1,mean,na.rm=TRUE)) user system elapsed 0.39 0.00 0.39 system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1)) user system elapsed 0.01 0.00 0.02 system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, + na.rm=TRUE)[,-1])) + ) user system elapsed 10.01 0.07 13.40 Cheers Joris On Sat, Jun 26, 2010 at 1:08 AM, emorway emor...@engr.colostate.edu wrote: Forum, Using the following data: DF-read.table(textConnection(A B 22.60 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 102.00 NA 19.20 NA 19.20 NA NA NA NA NA NA NA 11.80 NA 7.62 NA NA NA NA NA NA NA NA NA NA NA 75.00 NA NA NA 18.30 18.2 NA NA NA NA 8.44 NA 18.00 NA NA NA 12.90 NA),header=T) closeAllConnections() The second column is a duplicate reading of the first column, and when two values are available, I would like to average column 1 and 2 (example code below). But if there is only one reading, I would like to retain it, but I haven't found a good way to exclude NA's using the following code: t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1])) Currently, row 24 is the only row with a returned value. I'd like the result to return column A if it is the only available value, and average where possible. Of course, if both columns are NA, NA is the only possible result. The result I'm after would look like this (row 24 is an avg): 22.60 NA NA NA NA NA NA NA 102.00 19.20 19.20 NA NA NA 11.80 7.62 NA NA NA NA NA 75.00 NA 18.25 NA NA 8.44 18.00 NA 12.90 This is a small example from a much larger data frame, so if you're wondering what the deal is with list(), that will come into play for the larger problem I'm trying to solve. Respectfully, Eric -- View this message in context: http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.