Re: [R] Mantel test
Please keep the discussion on the list. It is hard to answer your question without context. You give us a warning message from a function that is not in base R. Perhaps it is in package ade? You also don’t include anything about the data or the commands that produced the warning message. A dist object does not contain a diagonal so your comment suggests that you did not convert the matrix to a dist object. David C From: Nick Jeffery [mailto:nick.w.jeffe...@gmail.com] Sent: Wednesday, May 6, 2015 9:34 AM To: David L Carlson Subject: Re: [R] Mantel test Hi, Thanks for the help. I get these warnings when I run the Mantel test however - is this because the diagonal of the matrix is all 0s? Both are symmetrical matrices about the diagonal line of zeroes. Warning messages: 1: In is.euclid(m1) : Zero distance(s) 2: In is.euclid(m2) : Zero distance(s) Thanks for your time, Nick On Mon, May 4, 2015 at 3:48 PM, David L Carlson mailto:dcarl...@tamu.edu>> wrote: Assuming the 'matrix' format is a symmetrical distance 'matrix' stored as a data frame (which read.csv creates) rather a rectangular data 'matrix,' you can convert it to a dist object with as.dist(). ?dist --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On Behalf Of Nick Jeffery Sent: Monday, May 4, 2015 10:49 AM To: r-help@r-project.org<mailto:r-help@r-project.org> Subject: [R] Mantel test Dear R users, I'm having trouble getting my data into R in the correct format to run a Mantel test. I'm testing genome size differences by genetic distances of the 28S gene for ~30 species. I'm able to get my genome size data (as a single column of data) into matrix and dist formats in R but the genetic distances output by MEGA are already in 'matrix' format so I don't know how to load this CSV file into R without it calculating new genetic distances when I convert it to the dist form required by the test. Thanks in advance, Nick -- Nick Jeffery, PhD Candidate Integrative Biology SCIE 1453 University of Guelph Guelph, Ontario, Canada [[alternative HTML version deleted]] __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Nick Jeffery, PhD Candidate Integrative Biology SCIE 1453 University of Guelph Guelph, Ontario, Canada [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mantel test
Assuming the 'matrix' format is a symmetrical distance 'matrix' stored as a data frame (which read.csv creates) rather a rectangular data 'matrix,' you can convert it to a dist object with as.dist(). ?dist --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Nick Jeffery Sent: Monday, May 4, 2015 10:49 AM To: r-help@r-project.org Subject: [R] Mantel test Dear R users, I'm having trouble getting my data into R in the correct format to run a Mantel test. I'm testing genome size differences by genetic distances of the 28S gene for ~30 species. I'm able to get my genome size data (as a single column of data) into matrix and dist formats in R but the genetic distances output by MEGA are already in 'matrix' format so I don't know how to load this CSV file into R without it calculating new genetic distances when I convert it to the dist form required by the test. Thanks in advance, Nick -- Nick Jeffery, PhD Candidate Integrative Biology SCIE 1453 University of Guelph Guelph, Ontario, Canada [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Results Differ in Ternary Plot Matrix of Compositional Response Variables
Add plotMissings=FALSE to the second plot and see the plot.acomp manual page description of this argument: plot(JerrittY, pch=as.numeric(JerrittX4), col=c("black","red", "dark green", "dark blue","dark goldenrod","dark orange","dark grey")[JerrittX4], plotMissings=FALSE) - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rich Shepard Sent: Thursday, April 30, 2015 3:57 PM To: r-help@r-project.org Subject: [R] Results Differ in Ternary Plot Matrix of Compositional Response Variables After hours of looking for the reason why one data set plots correctly and another one does not I am still not seeing the reason. The only differences I see between the two data sets is the number of discrete variables (one has 6 years, the other 7 years) and one contains zeros. I wonder if the number of discrete variables is the issue. I'm sure that more experienced eyes will see the reason for the different results and will point it out to me. The following data and code produce a matrix of ternary plots with the other continuous variables represented by a dot above the top point of the triangle: "Year","NO3","SO4","pH","Fi","Ga","Gr","Pr","Sh" "2005",0.60,816,7.87,0.0556,0.5370,0.1667,0.1667,0.0741 "2006",0.40,224,7.59,0.0435,0.6739,0.0870,0.1522,0.0435 "2010",0.10,571,7.81,0.0735,0.4706,0.1029,0.1912,0.1618 "2011",0.52,130,7.42,0.0462,0.5692,0.0769,0.2462,0.0615 "2012",0.42,363,7.79,0.0548,0.5205,0.0548,0.2466,0.1233 "2013",0.42,363,7.79,0.0484,0.5323,0.1129,0.2419,0.0645 # Create matrix of ternary plots of FFGs as dependent variables. # Follows 'Analyzing Compositional Data with R' sec. 5.3; pp 122 ff # Change stream name as necessary. # load package from library require(compositions) # read in raw data SnowRegr <- read.csv('snow-regression.dat', header=T) # extract response variables SnowY <- acomp(SnowRegr[,5:9]) # column headings; variables names(SnowRegr) # continuous explanatory co-variables SnowCovars <- SnowRegr[,c("Year","NO3","SO4","pH")] # first continuous co-variable SnowX1 <- SnowCovars$NO3 # second continuous co-variable SnowX2 <- SnowCovars$SO4 # third continuous co-variable SnowX3 <- SnowCovars$pH # discrete co-variable SnowX4 <- factor(SnowCovars$Year,c("2005","2006","2010","2011","2012","2013"),ordered=T) # for the discrete co-var, ANOVA not specified in unique way so contrasts must be specified; use the # treatment contrasts. contrasts(SnowX4) <- "contr.treatment" # save figure parameters opar <- par(xpd=NA,no.readonly=T) # ternary plot matrix plot(SnowY, pch=as.numeric(SnowX4), col=c("red","dark green","dark blue","dark goldenrod","dark orange","dark grey")[SnowX4]) # add legend legend(x=0.83, y=-0.165, abbreviate(levels(SnowX4), minlength=1),pch=as.numeric(SnowX4), col=c("red","dark green","dark blue","dark goldenrod","dark orange","dark grey"), ncol=2, xpd=T, bty="n", yjust=0) # reset plot parameters par(opar) # unload the package detach('package:compositions') This data set with eqivalent code produces plots with the other continuous variables as bars with colors on the top points of the triangles: "Year","NO3","SO4","pH","Fi","Ga","Gr","Pr","Sh" "2004",1.70,2200,8.70,0.0444,0.6889,0.0222,0.,0.0222 "2005",2.50,5000,8.43,0.0182,0.5636,0.0909,0.3091,0.0182 "2006",1.80,6670,8.57,0.0370,0.6173,0.0741,0.2469,0.0247 "2010",0.54,4000,8.00,0.0870,0.6087,0.0870,0.2174,0. "2011",2.70,4300,8.47,0.0449,0.5256,0.0897,0.2949,0.0449 "2012",0.76,595,8.21,0.,0.4231,0.0769,0.5000,0. "2013",0.76,595,8.21,0.,0.4545,0.0455,0.4545,0.0455 # Create matrix of ternary plots of FFGs as dependent variables. # Follows 'Analyzing Compositional Data with R' sec. 5.3; pp 122 ff # Change stream name as necessary. # load package from library require(compositions) # read in raw data JerrittRegr <- read.csv('jerritt-regression.dat', header=T) # extract response variables JerrittY <- acomp(JerrittRegr[,5:9]) # column headings; variables names(JerrittRegr) # continuous explanatory co-variables JerrittCovars <- JerrittRegr[,c("Year","NO3","SO4","pH")] # firs
Re: [R] Missing axis labels
I don't think you can tell in advance since the details of the plot are computed when you open the plot window and they change when you plot into the window. In principal you could estimate the size requirements for the axis labels if you know the plot window size and the character size. What gets plotted is also device dependent. For example if you open a window using x11(3, 3) (I'm on windows so I haven't tried this on OS X) and produce the plot, the last x-axis is missing just as in the pdf file. But if you drag the window to make it larger, the label will appear when the device driver redraws the plot. There is also a third option in addition to your two to getting all of the labels: plot(0:100, 0:100, xaxp=c(0, 100, 4)) will plot at 0, 25, 50, 75, 100 which leaves room for the last label. --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Fisher Dennis Sent: Friday, May 1, 2015 9:11 AM To: r-h...@stat.math.ethz.ch Subject: [R] Missing axis labels R 3.2.0 OS X This is a general question, not specific to OS X. Colleagues Often, one or more values on an axis will be omitted, presumably in order to prevent overlap. However, there are situations where I would like to override that omission. Sample code: pdf("labels.pdf", width=3, height=3) plot(0:100, 0:100) graphics.off() Here, 100 is omitted from the x-axis and 20, 60, and 100 from the y-axis. Is there is automated way to detect which values will be omitted (i.e., without seeing the graphic)? If so, I see two options: 1. change the font size 2. force the entry, e.g., axis(1, 100, at=100) Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot Title: Adjusting Position
There are half a dozen implementations of ternary plots in as many R packages so it is hard to be specific. Since you are using title(), try the "further graphical parameters from par" mentioned in the manual page such as adj=c(x, y) for position (or maybe the line= argument) and cex.main= for size. ----- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rich Shepard Sent: Friday, May 1, 2015 10:00 AM To: r-help@r-project.org Subject: [R] Plot Title: Adjusting Position Plots of compositional data ternary diagrams do not accept the main label within the plot() function, but do print the label when it is specified within the title() function. On some of these plots I need to raise the position of the title just enough to move the text above the top row of diagrams. Applying the outer=TRUE option moves the title too high; the top half of the text is cut off from viewing. The help file, ?title, suggests that the line option applies to sub-titles and axis labels, not the main title. Setting character expansion to a negative value throws an error. How can I either move the main title sightly higher on the figure or slightly reduce the text size so the title does not overlap part of the ternary diagrams? Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Editable plot
Do not post in html. You need to change your email software so that it sends messages in plain text only. Look below to see why. Your plot is edited by modifying the code you gave us to change the graph. Save the code in a script file, change it in any way you want and then run the code again to get a changed plot. You cannot edit the plot by selecting an element on the plot and changing its properties in some way. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of IZHAK shabsogh via R-help Sent: Thursday, April 30, 2015 2:04 AM To: R. Subject: [R] Editable plot Hello,Kindly assist me on how to make the plot from the following programm to be editable x<-c(0.84,1.03,0.96)y<-c(1.30,1.46,1.48)z<-c(1.32,1.47,1.5)w<-c(0.07,0.07,0.07)r<-c(500,1000,2000) # Graph cars using a y axis that ranges from 0 to 12plot(r,x, type="o", col="blue", ylim=c(0,1.5),lwd= 2, xlab = " Number of iteration",ylab=" Bias" ) # Graph trucks with red dashed line and square pointslines(r,y, type="o", pch=22, lty=2, col="red",lwd=2)lines(r,z, type="o", pch=22, lty=3, col="green",lwd=2)lines(r,w, type="o", pch=22, lty=4, col="forestgreen",lwd=2) # Create a title with a red, bold/italic font#title(main="Estimated Bias for the optimal response ", col.main="red", font.main=4) #legend("center", lty = 1:4, col = 1:4, #legend = c("x","y", "z","w")) text(1000, 0.15, "PM")text(1000, 1.10, "VM")text(1000, 1.52, "WMSE")text(1000, 1.40, "LT") Thank youIshaq [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Graphs for scientific publication ?
More useful to the r-help list would be a reproducible example of the data you are using and a clear statement of what you are trying to accomplish. It is likely that all of your requirements can be easily met, but you spent most of your message talking about what you have tried without telling us where you want to end up. People on the list are familiar with base graphics, lattice graphics, and ggplot2. If you list your requirements clearly, you might end up with three solutions. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter Sent: Thursday, April 30, 2015 1:41 PM To: Jeremy Clark Cc: r-help@r-project.org Subject: Re: [R] Graphs for scientific publication ? Jeremy: I suggest you have a look at the latest edition of Paul Murrell's book, "R Graphics", as you seem to be unaware that ggplot2 (as well as a 3rd graphics paradigm, the lattice package) and base graphics are built on 2 different and incompatible graphics engines. Obviously, you are entitled to your opinions and graphical predilections vary, but I do not think R-Help is a good venue for these sorts of discussions. The R-devel list might be a better place to discuss such matters. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Thu, Apr 30, 2015 at 5:05 AM, Jeremy Clark wrote: > Dear All, > > First of all, many thanks to all R contributors for a fantastic > program, and especially to Hadley Wickham for creating ggplot2. The > following is intended to be a warning that, if the apparently > superficial problems described are not sorted out, R could well find > itself being superceded. The reason is that a new user wants to draw a > graph, and perhaps publish in a scientific journal a graph created > using R, well before wanting to do a complex regression (and the > latter is relatively easy). So here goes: > > 1) The saga of the straight line. I implemented a geom_abline - it > looked superb. Unfortunately I had to disable clip to allow text - now > my abline looked ridiculous. My search found plotrix: ablineclip - > fantastic I thought - but it applies to plot and not geom_plot. I > switched to geom_segment - the rendering looked trash. I switched to > geom_smooth - should work but as I don't know the x values beforehand > I'll have to clip a new dataframe - it that a hassle ? - Yes it is ! > > So my general question is - why isn't ggplot2 already part > of R base - or at least if someone is to create useful packages for > plot - perhaps a subtle hint could be made that they should also apply > to ggplot2 (and perhaps to lattice ?? - also personally I would scrap > qplot as an unnecessary distraction which is not easier to implement > than ggplot). In general duplication of packages for plot and ggplot > doesn't seem like a good idea. > > > 2) The saga of the italic letter. I found, to my dismay, that to > insert an italic letter into my plot I had to learn a whole new > language called plotmath - which wouldn't accept normal R coding, and > didn't even have normal control functions such as /n for a new line. > This is ridiculous (and I'm not sure how plotmath managed to get into > R base). > > So my question is, when is plotmath going to have a > complete overhaul to allow eg. "," instead of, or as well as, ~,~, and > normal control functions such as \n ? > > 3) A related question to (2) is: where is geom_textbox ? > > 4) Where are examples with scientific graph defaults ? (meaning a > two-axis graph which is publishable - I will post my own after this is > published in a years time, but as suggested above, while the graph > looks good the implementation of this is not pretty). > > Having said that - good luck with implementation - and many thanks for > all your hard work ! > > Yours sincerely, > > Abiologist > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ___
Re: [R] Problem with predict.lm()
Since you passed a matrix to lm() and then a data.frame to predict(), predict can't match up what variables to use for the prediction so it falls back on the original data. This seems to work: > set.seed(42) > y <- rnorm(100) > X <- matrix(rnorm(100*10), ncol=10) > Xd <- data.frame(X) > lm <- lm(y~., Xd) > Xnew <- matrix(rnorm(100*20), ncol=10) > Xnewd <- data.frame(Xnew) > ynew <- predict(lm, newdata=Xnewd) > head(ynew) 1 2 3 4 5 6 0.35404067 0.14073495 -0.45442499 0.31065562 -0.02091366 0.25358175 > head(predict(lm)) 1 2 3 4 5 6 0.75474817 0.06024122 -0.27221466 -0.20344713 0.20218135 -0.24045859 > - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Martin Spindler Sent: Wednesday, April 29, 2015 9:21 AM To: r-help@r-project.org Subject: [R] Problem with predict.lm() Dear all, the following example somehow uses the "old data" (X) to make the predictions, but not the new data Xnew as intended. y <- rnorm(100) X <- matrix(rnorm(100*10), ncol=10) lm <- lm(y~X) Xnew <- matrix(rnorm(100*20), ncol=10) ynew <- predict(lm, newdata=as.data.frame(Xnew)) #prediction in not made for Xnew How can I foce predict.lm to use use the new data? Thank you very much for your efforts in advance! Best, Martin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cite publications in the package help file
Reproducible examples help. For package MASS do you mean? http://cran.r-project.org/web/packages/MASS/index.html Which provides information about the package and a link to the Reference manual: http://cran.r-project.org/web/packages/MASS/MASS.pdf In that manual data sets and functions contain a Source entry and/or References for that function or data set. For a large package such as MASS with over 150 functions/data sets, it would be unwieldy to put them all on the web page. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of carol white via R-help Sent: Tuesday, April 28, 2015 1:28 PM To: Duncan Murdoch; R-help Help Subject: Re: [R] cite publications in the package help file the main web page is meant the page when a package is accessed on CRAN. So is it possible on this page that the content of DESCRIPTION is displayed to display the related publications and also put the related publications so that they appear on the help pdf file? On Tuesday, April 28, 2015 7:37 PM, Duncan Murdoch wrote: On 28/04/2015 1:00 PM, carol white via R-help wrote: > To cite related publications, it seems that they can't be mentioned in > DESCRIPTION. Where to mention so that it appears on the 1st page of the pdf > help file and the package main web page? I'm not talking about what is > specified in inst/citation. The package help file (e.g. foo-package.Rd for package "foo") will be displayed first in the PDF, and is the first entry linked in the help page index for the package. I don't know what page you mean as the "package main web page". Duncan Murdoch [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about base::rank results
Apologies if this belabors the point, but let's look at your second example to see why order and rank are different: > x <- c(12,34,15,77,78,22) > names(x) <- 1:6 > x 1 2 3 4 5 6 12 34 15 77 78 22 I've added names to the values so we can watch how they change. If we sort the numbers we get them in increasing order with their original indices: > sort(x) 1 3 6 2 4 5 12 15 22 34 77 78 The values in are order and the names show where each value came from originally. That sequence of index values is exactly what order(x) gives you: > order(x) [1] 1 3 6 2 4 5 > x[order(x)] 1 3 6 2 4 5 12 15 22 34 77 78 The rank function gives you the relative size of the value, not its position in the original vector: > rank(x) 1 2 3 4 5 6 1 4 2 5 6 3 > x[rank(x)] 1 4 2 5 6 3 12 77 34 78 22 15 The second value has rank 4, but that is not its index which is 2. The value with index 4 is 77 so it shows up in the second position. --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of J Robertson-Burns Sent: Monday, April 27, 2015 2:34 PM To: Giorgio Garziano; r-help@r-project.org Subject: Re: [R] Question about base::rank results There is a blog post on this topic: http://www.portfolioprobe.com/2012/07/26/r-inferno-ism-order-is-not-rank/ Pat On 26/04/2015 09:17, Giorgio Garziano wrote: > Hi, > > I cannot understand why rank(x) behaves as outlined below. > Based on the results of first x vector values ranking, which is as expected > in my opinion, > I cannot explain the following results. > >> x <- c(12,34,15,77,78) >> x[rank(x)] > [1] 12 15 34 77 78 (OK) > >> x <- c(12,34,15,77,78,22) >> x[rank(x)] > [1] 12 77 34 78 22 15 (?) > >> x <- c(12,34,77,15,78) >> x[rank(x)] > [1] 12 77 15 34 78 (?) > > Please any feedback ? Thanks. > > BR, > > Giorgio Garziano > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] - Obtaining superscripts to affix to means that are not significantly different from each other with R
The function cld() in package multcomp generates compact letter displays, but does not format them as exponents of the group names. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Joachim Audenaert Sent: Thursday, April 23, 2015 4:58 AM To: r-help@r-project.org Subject: [R] - Obtaining superscripts to affix to means that are not significantly different from each other with R Hello all, It is often time consuming to interpret p-values of multiple pairwise comparisons of groups and assign them a letter code for publication purposes. So I found this interesting link to a program that does this for you. http://www.jerrydallal.com/lhsp/similar.htm I was wondering if something similar exists in R? Met vriendelijke groeten - With kind regards, Joachim Audenaert onderzoeker gewasbescherming - crop protection researcher PCS | proefcentrum voor sierteelt - ornamental plant research Schaessestraat 18, 9070 Destelbergen, Belgi� T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95 E: joachim.audena...@pcsierteelt.be | W: www.pcsierteelt.be Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? | Het PCS op LinkedIn Disclaimer | Please consider the environment before printing. Think green, keep it on the screen! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vectorizing a task
It is not vectorized, but it is simple: EXPANDED <- unlist(mapply(":", START, END)) ----- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Dennis Fisher Sent: Tuesday, April 14, 2015 2:36 PM To: r-h...@stat.math.ethz.ch Subject: [R] Vectorizing a task R 3.1.3 OS X Colleagues I have data of this sort: START <- c(1, 2, 3, 4, 8, 14, 15, 118, 118, 119, 202, 202, 203, 204) END <- c(1, 2, 3, 6, 13, 14, 117, 118, 118, 201, 202, 202, 203, 204) I would like to create a vector that looks like this: START.to.END<- c(1:1,2:2,3:3,4:6,8:13,14:14,15:117,118:118,118:118,119:201,202:202,202:202,203:203,204:204) i.e., each pair of entries is link with “:”, then these are concatenated. Ultimately, this will be expanded into: EXPANDED<- c(1L, 2L, 3L, 4L, 5L, 6L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L, 105L, 106L, 107L, 108L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L, 117L, 118L, 118L, 119L, 120L, 121L, 122L, 123L, 124L, 125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L, 133L, 134L, 135L, 136L, 137L, 138L, 139L, 140L, 141L, 142L, 143L, 144L, 145L, 146L, 147L, 148L, 149L, 150L, 151L, 152L, 153L, 154L, 155L, 156L, 157L, 158L, 159L, 160L, 161L, 162L, 163L, 164L, 165L, 166L, 167L, 168L, 169L, 170L, 171L, 172L, 173L, 174L, 175L, 176L, 177L, 178L, 179L, 180L, 181L, 182L, 183L, 184L, 185L, 186L, 187L, 188L, 189L, 190L, 191L, 192L, 193L, 194L, 195L, 196L, 197L, 198L, 199L, 200L, 201L, 202L, 202L, 203L, 204L) The final step will be to find which values are missing from the sequence: setdiff(1:max(EXPANDED), EXPANDED) The command: paste0("c(", paste(paste(ALLSTART, ALLEND, sep=":"), collapse=","), ")") creates the text for START.to.END, but I can’t figure out how to evaluate that expression. I could build the vector step-by-step but that seems quite inefficient. Any suggestions? Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting unique entries by a column
Try all.equal(df[1,3], df[2,3]) This relates to how decimal numbers are stored in computers. It is not an R only issue, but it is described in the R-FAQ: >From the R-FAQ - http://cran.r-project.org/doc/FAQ/R-FAQ.html 7.31 Why doesn't R think these numbers are equal? The only numbers that can be represented exactly in R's numeric type are integers and fractions whose denominator is a power of 2. Other numbers have to be rounded to (typically) 53 binary digits accuracy. As a result, two floating point numbers will not reliably be equal unless they have been computed by the same algorithm, and not always even then. For example R> a <- sqrt(2) R> a * a == 2 [1] FALSE R> a * a - 2 [1] 4.440892e-16 The function all.equal() compares two objects using a numeric tolerance of .Machine$double.eps ^ 0.5. If you want much greater accuracy than this you will need to consider error propagation carefully. For more information, see e.g. David Goldberg (1991), "What Every Computer Scientist Should Know About Floating-Point Arithmetic", ACM Computing Surveys, 23/1, 5-48, also available via http://www.validlab.com/goldberg/paper.pdf. To quote from "The Elements of Programming Style" by Kernighan and Plauger: 10.0 times 0.1 is hardly ever 1.0. ----- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Vikram Chhatre Sent: Tuesday, April 14, 2015 2:40 PM To: r-help Subject: [R] Extracting unique entries by a column I have a data frame of dim 3x600. There are pairs of rows which have the exact same value in column 3. head(df) POP1 POP2 ABSDIFF L0005.01 0.98484848 0.688118812 0.2967297 L0005.03 0.01515152 0.311881188 0.2967297 L0008.02 0.97727273 0.004424779 0.9728479 L0008.04 0.02272727 0.995575221 0.9728479 L0012.03 0.98684211 0.004385965 0.9824561 L0012.01 0.01315789 0.995614035 0.9824561 I want to unique sort on df$ABSDIFF so that only one row per pair remains in the subset. >df_subset <- df[df(!duplicated(df$ABSDIFF), ] This does not work. So I literally checked: >identical(df[1,3], df[2,3]) FALSE How is 0.2967297 different from 0.2967297? I am puzzled. Thanks for any insight. Vikram [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum of some months totals
Don't use html formatted emails and always copy the list on your replies. For example? rainstats <- function(data, months=3) { if (! months %in% c(1, 2, 3, 4, 6, 12)) stop("Months must divide into 12!") period <- 12/months grps <- rep(1:period, each=months) Group <- grps[rainfall$Month] aggregate(Rain~Year+Group, rainfall, function(x) c(sum=sum(x), days=sum(x>0))) } > rainstats(rainfall) Year Group Rain.sum Rain.days 1 1979 10 0 --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 From: Frederic Ntirenganya [mailto:ntfr...@gmail.com] Sent: Tuesday, April 14, 2015 9:27 AM To: David L Carlson Subject: Re: [R] Sum of some months totals Hi David, I understand what you did. My aim is to make a function which takes a quarter as a default. i.e I can compute lets say for 4 motnhs by specifying it in the arguments of the function. Regards, Frederic. Frederic Ntirenganya Maseno University, African Maths Initiative, Kenya. Mobile:(+254)718492836 Email: fr...@aims.ac.za https://sites.google.com/a/aims.ac.za/fredo/ On Tue, Apr 14, 2015 at 4:44 PM, David L Carlson wrote: You should read some beginning tutorials for R before you go further. You are wasting a lot of time writing complicated loops that you do not need. R is probably very different from the programming languages you are used to. In these examples I called your data "rainfall." To get the sum of the rain for each month you need only: aggregate(Rain~Year+Month, rainfall, sum) To get the number of days with rain is slightly more complicated: aggregate(Rain~Year+Month, rainfall, function(x) sum(x>0)) To get the sum for a quarter, you need to add quarters to your data frame, eg. Notice that it does not require a loop to add an entire column to your existing data frame. rainfall$Quarter <- (rainfall$Month+2) %/% 3 aggregate(Rain~Year+Quarter, rainfall, sum) The command ?aggregate will bring up a manual page on the aggregate() function. Read "Introduction to R" at http://cran.r-project.org/manuals.html and one or more of the contributed manuals at http://cran.r-project.org/other-docs.html - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Frederic Ntirenganya Sent: Tuesday, April 14, 2015 6:10 AM To: Adams, Jean Cc: r-help@r-project.org Subject: Re: [R] Sum of some months totals Hi Jean, Thanks for the help! How can I compute monthly total of rainfall? I want to compute both monthly total of rainfall and number of raindays. In below function month_tot is a table and I want to some month. default is 3 months. The loop for quarter is not working and I am wondering why it is not working. total = function(data, threshold = 0.85){ month_tot=matrix(NA,length(unique(data$Year)),12) rownames(month_tot)=as.character(unique(data$Year)) colnames(month_tot)=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec") raindays=month_tot # loop over months and years to get summary statistics for (mon in 1:12) { rain=data[data[2]==mon,c(1,4)] # rain just for a specific month for (yr in unique(data$Year)) { month_tot[yr-min(unique(data$Year)-1),mon]=sum(rain[rain[,1]==yr,2]) #print(sum(rain[rain[,1]==yr,2])) raindays[yr-min(unique(data$Year)-1),mon]=sum(rain[rain[,1]==yr,2]>threshold) } } month_tot 1:ncol(month_tot) #month_tot[,1] + month_tot[,2] + month_tot[,3] quarter <-c() i = 3 for (i in 1:ncol(month_tot)){ quarter[i] = sum(month_tot[,i]) } quarter } total(kitale) Regards, Frederic. Frederic Ntirenganya Maseno University, African Maths Initiative, Kenya. Mobile:(+254)718492836 Email: fr...@aims.ac.za https://sites.google.com/a/aims.ac.za/fredo/ On Tue, Apr 14, 2015 at 1:52 PM, Adams, Jean wrote: > If you want to calculate the number of days having greater than a certain > threshold of rain within a range of months, a function like this might > serve your needs. > > raindays <- function(data, monStart=1, monEnd=3, threshold=0.85) { > with(data, { > selRows <- Month >= monStart & Month <= monEnd & Rain > threshold > days <- tapply(selRows, Year, sum) > return(days) > }) > } > > raindays(kitale) > > Jean > > On Tue, Apr 14, 2015 at 2:46 AM, Frederic Ntirenganya > wrote: > >> I want to compute monthly summaries from daily data. I want to choose >> which >> month to start and how many months to total over. Default could be to >> start in Januar
Re: [R] R studio installation
R Studio loads R if it can find it. Since you have installed R, the error message means that R Studio can't find it or is not sure which version to use. The part of the message that says "please select the version of R to use" should give you a dialog box to use to navigate to the directory that contains R. Once you have done this, R Studio will remember where it is. The most likely explanation is one of these: 1. You installed R in the default location "C:\Program Files\R" but you have multiple installations as a result of updating R. By default R creates a new subdirectory for each new version. As a result R Studio does not know which one you want. 2. You installed both 32-bit and 64-bit versions of R so R Studio does not know which one to use. 3. You installed R in a location other than the default location and R Studio cannot fine it. --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jeff Newmiller Sent: Tuesday, April 14, 2015 7:43 AM To: John Kane; Sojood Malkawi; r-help@r-project.org Subject: Re: [R] R studio installation But if the answer to the question "Does R load on its own?" is "no" then this probably is the right place to ask for help. Of course, I would probably just suggest re-installing R, but someone else here might have better answers. --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On April 14, 2015 5:14:21 AM PDT, John Kane wrote: >You probably should go to the RStudio help/blog rather than here. This >is not an RStudio list and the expertise is at the RStudoi site. > >Does R load on its own? > >What OS are you using? > >John Kane >Kingston ON Canada > > >> -Original Message- >> From: sojoodmlk1...@gmail.com >> Sent: Tue, 14 Apr 2015 11:45:05 +0300 >> To: r-help@r-project.org >> Subject: [R] R studio installation >> >> I installed R and then R studio but it doesn't open every time i try >to >> open it it gives me this message "Rstudio requires an existing >> installation of R in order to work. please select the version of R to >use >> ". >> i'm using R i386 3.1.3 and downloaded RStudio 0.98.1103 - Windows >> XP/Vista/7/8. do you have any idea what the problem is? >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > >Can't remember your password? Do you need a strong and secure password? >Use Password manager! It stores your passwords & protects your account. > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert color hex code to color names
Actually all 6 colors in rainbow(6) do have names. I missed the fact that rainbow() adds an alpha value that we need to strip off before comparing to the values in clrs$RGB: > rain <- substr(rain, 1, 7) > sum(clrs$RGB %in% rain) [1] 12 So there are two color names for each color in rainbow(6): > for (i in 1:6) cat(i, colors()[clrs$RGB==rain[i]], "\n") 1 red red1 2 yellow yellow1 3 green green1 4 cyan cyan1 5 blue blue1 6 magenta magenta1 David C -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson Sent: Monday, April 13, 2015 12:07 PM To: Boris Steipe; Alejo C.S. Cc: r-help@r-project.org Subject: Re: [R] Convert color hex code to color names And expanding at a more elementary level. The reason you need to find the smallest difference is that all of the possible colors do not have names. There are 256^3 = 16,777,216 possible rgb color designations, but only 657 named colors. You can create a data frame of the named colors and their rgb designations using > clrs <- data.frame(Color=colors(), RGB=rgb(t(col2rgb(colors())), maxColorValue=255), stringsAsFactors=FALSE) > str(clrs) 'data.frame': 657 obs. of 2 variables: $ Color: chr "white" "aliceblue" "antiquewhite" "antiquewhite1" ... $ RGB : chr "#FF" "#F0F8FF" "#FAEBD7" "#FFEFDB" ... > head(clrs) Color RGB 1 white #FF 2 aliceblue #F0F8FF 3 antiquewhite #FAEBD7 4 antiquewhite1 #FFEFDB 5 antiquewhite2 #EEDFCC 6 antiquewhite3 #CDC0B0 So most colors do not have names. In your example, none of the colors in rainbow(6) have names: > rain <- rainbow(6) > sum(clrs$RGB %in% rain) [1] 0 - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Boris Steipe Sent: Monday, April 13, 2015 11:44 AM To: Alejo C.S. Cc: r-help@r-project.org Subject: Re: [R] Convert color hex code to color names To add slightly to that: What you want to do is write a function that returns the named color that has the smallest difference to your input hex-triplet. But note that color difference is a large topic. Assuming you want to minimize *perceptual* differences, you want to calculate your differences in Lab color space. The function convertColor() has the option to convert hex to Lab. Example: convertColor(t(col2rgb("thistle")), from="sRGB", to="Lab", scale.in=255) Within Lab space, you can take the Euclidian distance. That all said, I can't imagine why one would want to do this in the first place - color triplets are much more convenient than label strings :-) B. On Apr 13, 2015, at 11:45 AM, Thierry Onkelinx wrote: > A combination of rgb(), col2rgb() and colors() can gives hex values for the > named colors. > > ir. Thierry Onkelinx > Instituut voor natuur- en bosonderzoek / Research Institute for Nature and > Forest > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance > Kliniekstraat 25 > 1070 Anderlecht > Belgium > > To call in the statistician after the experiment is done may be no more > than asking him to perform a post-mortem examination: he may be able to say > what the experiment died of. ~ Sir Ronald Aylmer Fisher > The plural of anecdote is not data. ~ Roger Brinner > The combination of some data and an aching desire for an answer does not > ensure that a reasonable answer can be extracted from a given body of data. > ~ John Tukey > > 2015-04-13 17:28 GMT+02:00 Alejo C.S. : > >> Hi all, I want to convert the output of: >> >>> rainbow(6) >> >>> [1] "#FFFF" "#00FF" "#00FF00FF" "#00FF" "#" >> "#FF00" >> >> To a vector of color names. Any tip? >> >> >> Thanks in advance >> >> C. >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commen
Re: [R] Convert color hex code to color names
And expanding at a more elementary level. The reason you need to find the smallest difference is that all of the possible colors do not have names. There are 256^3 = 16,777,216 possible rgb color designations, but only 657 named colors. You can create a data frame of the named colors and their rgb designations using > clrs <- data.frame(Color=colors(), RGB=rgb(t(col2rgb(colors())), maxColorValue=255), stringsAsFactors=FALSE) > str(clrs) 'data.frame': 657 obs. of 2 variables: $ Color: chr "white" "aliceblue" "antiquewhite" "antiquewhite1" ... $ RGB : chr "#FF" "#F0F8FF" "#FAEBD7" "#FFEFDB" ... > head(clrs) Color RGB 1 white #FF 2 aliceblue #F0F8FF 3 antiquewhite #FAEBD7 4 antiquewhite1 #FFEFDB 5 antiquewhite2 #EEDFCC 6 antiquewhite3 #CDC0B0 So most colors do not have names. In your example, none of the colors in rainbow(6) have names: > rain <- rainbow(6) > sum(clrs$RGB %in% rain) [1] 0 - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Boris Steipe Sent: Monday, April 13, 2015 11:44 AM To: Alejo C.S. Cc: r-help@r-project.org Subject: Re: [R] Convert color hex code to color names To add slightly to that: What you want to do is write a function that returns the named color that has the smallest difference to your input hex-triplet. But note that color difference is a large topic. Assuming you want to minimize *perceptual* differences, you want to calculate your differences in Lab color space. The function convertColor() has the option to convert hex to Lab. Example: convertColor(t(col2rgb("thistle")), from="sRGB", to="Lab", scale.in=255) Within Lab space, you can take the Euclidian distance. That all said, I can't imagine why one would want to do this in the first place - color triplets are much more convenient than label strings :-) B. On Apr 13, 2015, at 11:45 AM, Thierry Onkelinx wrote: > A combination of rgb(), col2rgb() and colors() can gives hex values for the > named colors. > > ir. Thierry Onkelinx > Instituut voor natuur- en bosonderzoek / Research Institute for Nature and > Forest > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance > Kliniekstraat 25 > 1070 Anderlecht > Belgium > > To call in the statistician after the experiment is done may be no more > than asking him to perform a post-mortem examination: he may be able to say > what the experiment died of. ~ Sir Ronald Aylmer Fisher > The plural of anecdote is not data. ~ Roger Brinner > The combination of some data and an aching desire for an answer does not > ensure that a reasonable answer can be extracted from a given body of data. > ~ John Tukey > > 2015-04-13 17:28 GMT+02:00 Alejo C.S. : > >> Hi all, I want to convert the output of: >> >>> rainbow(6) >> >>> [1] "#FFFF" "#00FF" "#00FF00FF" "#00FF" "#" >> "#FF00" >> >> To a vector of color names. Any tip? >> >> >> Thanks in advance >> >> C. >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to Subset based on partial matching of columns?
>From Sarah's data frame you can get what you want directly with the table() >function which will create a table object, mydf.tbl. If you want a data frame >you need to convert the table using as.data.frame.matrix() to make mydf.df. >Finally combine the two data frames if your x column consists of unique values >in ascending order to make mydf.all. > mydf.tbl <- table(mydf$x, mydf$code) > mydf.tbl LGTY MY GM+ RS TY 10 1 0 0 21 0 0 0 30 0 1 0 40 0 0 1 > mydf.df <- as.data.frame.matrix(mydf.tbl) > mydf.df LGTY MY GM+ RS TY 10 1 0 0 21 0 0 0 30 0 1 0 40 0 0 1 > mydf.all <- data.frame(mydf, mydf.df) > mydf.all x code LGTY MY.GM. RS TY 1 1 MY GM+0 1 0 0 2 2 LGTY1 0 0 0 3 3 RS0 0 1 0 4 4 TY0 0 0 1 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of samarvir singh Sent: Thursday, April 9, 2015 8:50 AM To: Sarah Goslee Cc: r-help Subject: Re: [R] how to Subset based on partial matching of columns? Thank you. Sarah Goslee. I am rather new in learning R. So people like you are great support. Really appreciate you, taking the time to correct my mistakes. Thanks On Thu 9 Apr, 2015 6:54 pm Sarah Goslee wrote: > Hi, > > Please don't put quotes around your code. It makes it hard to copy and > paste. Alternatively, don't post in HTML, because it screws up your > code. > > On Wed, Apr 8, 2015 at 8:57 PM, samarvir singh > wrote: > > So I have a list that contains certain characters as shown below > > > > `list <- c("MY","GM+" ,"TY","RS","LG")` > > That's a character vector, not a list. A list is a specific type of object > in R. > > > And I have a variable named "CODE" in the data frame as follows > > > > `code <- c("MY GM+", ,"LGTY", "RS","TY")` > > That doesn't work, and I have no idea what you expect to have there, > so I'm deleting the extra comma. Also, your vector is named code, not > CODE. > > code <- c("MY GM+", "LGTY", "RS","TY") > x <- c(1:4) > > > 'x <- c(1:5) > > `df <- data.frame(x,code)` > > You problably actually want > mydf <- data.frame(x, code, stringsAsFactors=FALSE) > > Note I changed the name, because df() is a base R function. > > > > Now I want to create 5 new variables named "MY","GM+","TY","RS","LG" > > > > Which takes binary value, 1 if there's a match case in the CODE variable > > > > df > > x code MY GM+ TY RS LG > > 1 MY GM+ 1 1 00 0 > > 2 0 0 00 0 > > 3 LGTY 0 0 1 0 1 > > 4 RS 0 0 010 > > 5 TY 0 0 100 > > grepl() will give you a logical match > > data.frame(mydf, sapply(code, function(x)grepl(x, mydf$code)), > stringsAsFactors=FALSE, check.names=FALSE) > > Sarah > > > -- > Sarah Goslee > http://www.functionaldiversity.org > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort adjacency matrix
The answer depends on what kind of matrix/data frame you have. That is why we encourage people to use dput() to create a copy of the sample data in their email. Some combination of order() function the rowSums() function will probably get you what you want. For example, dat[order(rowSums(dat=="1"), decreasing=TRUE),] or dat[order(rowSums(dat), decreasing=TRUE),] or dat[order(rowSums(dat, na.rm=TRUE), decreasing=TRUE),] Note that the order is not unique since there are ties in the number of 1s. ----- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ragia Ibrahim Sent: Monday, April 6, 2015 12:18 PM To: r-help@r-project.org Subject: [R] sort adjacency matrix Dear group i have the following matrix 1 . . 1 . . 1 . . . . 2 . . . . . . 1 . . . 3 1 . . . 1 . . 1 . 1 4 . . . . . 1 . . . . 5 . . 1 . . . . . . 1 6 1 . . 1 . . . . 1 . 7 . 1 . . . . . 1 . . 8 . . 1 . . . 1 . . 1 9 . . . . . 1 . . . 1 10 . . 1 . 1 . . 1 1 . I want to sort it according to ones in each row ascending (where max number of ones first) to be as follow 3 1 . . . 1 . . 1 . 1 10 . . 1 . 1 . . 1 1 . 6 1 . . 1 . . . . 1 .8 . . 1 . . . 1 . . 11 . . 1 . . 1 . . . .5 . . 1 . . . . . . 17 . 1 . . . . . 1 . .9 . . . . . 1 . . . 12 . . . . . . 1 . . .4 . . . . . 1 . . . . how can I do this in R thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Again: A problem someone should know about
In your first example you created logfat.lm and then tried to plot logfat so you got an error indicating that logfat did not exist. In your second example we have no idea what body.fat. You must make your examples reproducible so that we can reproduce your error. It looks like you could also benefit from spending a little time learning about R using a free tutorial. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ian Lester Sent: Monday, March 30, 2015 12:59 AM To: r-help@r-project.org Subject: [R] Again: A problem someone should know about i have no idea what to do > plot(body.fat, BMI,xlab="Body fat",ylab="BMI",main=“Figure 2.1: BMI vs Body > fat (n=252)”) Error: unexpected input in "plot(body.fat, BMI,xlab="Body fat",ylab="BMI",main=�" > plot(body.fat, BMI,xlab="Body fat",ylab="BMI") serious error. This application, or a library it uses, is using an invalid context and is thereby contributing to an overall degradation of system stability and reliability. This notice is a courtesy: please fix this problem. It will become a fatal error in an upcoming update. > > Begin forwarded message: > > From: Ian Lester > Reply-To: ihles...@mensa.org.au > Subject: A problem someone should know about > Date: 30 March 2015 9:52:54 am AEDT > To: r-help@r-project.org > > I’m a novice and this message looks like it shouldn’t be ignored. Someone who > knows what they’re doing should probably take a look. > Thanks > Ian Lester > >> logfat.lm<-(lm(body.fat~log(BMI))) >> plot(logfat) > Error in plot(logfat) : object 'logfat' not found >> plot(logfat.lm) > Hit to see next plot: > Hit to see next plot: > Hit to see next plot: > Mar 29 18:10:18 iansimac.gateway rsession[69550] : Error: this > application, or a library it uses, has passed an invalid numeric value (NaN, > or not-a-number) to CoreGraphics API. This is a serious error and contributes > to an overall degradation of system stability and reliability. This notice is > a courtesy: please fix this problem. It will become a fatal error in an > upcoming update. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] textplot() in wordcloud package
Another possibility is to use pointLabel() in package maptools. For your example library(maptools) plot(x,y) pointLabel(x, y, text1) Advantages of pointLabel() are that it returns a list of the x and y coordinates of the labels that you can tweak if necessary and, at least in your example, it does a better job of avoiding labels being chopped at the plot margins. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson Sent: Monday, March 16, 2015 10:44 AM To: Fraser D. Neiman; r-help@r-project.org Subject: Re: [R] textplot() in wordcloud package You should contact the package maintainer about this. The problem is that the pos= argument is being passed to strwidth() and strheight() and those functions do not know what to do with it. In the meantime: suppressWarnings(textplot(x,y, text1, new=F, show.lines=F, pos=4)) will eliminate the warnings. ----- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Fraser D. Neiman Sent: Friday, March 13, 2015 3:29 PM To: r-help@r-project.org Subject: [R] textplot() in wordcloud package Dear All, The textplot() function in the wordcloud package seem to do a good job with generating non-overlapping labels on a scatter plot. But it throws "warnings" when I try to use the pos= parameter to position the text labels relative to a given x-y point. Here is a simple example: x<-runif(100) y<-runif(100) text1<- rep('LAB', 100) plot(x,y) textplot(x,y, text1, new=F, show.lines=F, pos=4) There were 50 or more warnings (use warnings() to see the first 50) > warnings() Warning messages: 1: In strwidth(words[i], cex = cex[i], ...) : "pos" is not a graphical parameter 2: In strheight(words[i], cex = cex[i], ...) : "pos" is not a graphical parameter How can I pass the pos=parameter to text() without generating the warnings? I am doubly puzzled by the warnings because in the graph that results from the foregoing code, The labels are to the right of the points, as 'pos=4' requests. Thanks! Fraser D. Neiman __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] textplot() in wordcloud package
You should contact the package maintainer about this. The problem is that the pos= argument is being passed to strwidth() and strheight() and those functions do not know what to do with it. In the meantime: suppressWarnings(textplot(x,y, text1, new=F, show.lines=F, pos=4)) will eliminate the warnings. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Fraser D. Neiman Sent: Friday, March 13, 2015 3:29 PM To: r-help@r-project.org Subject: [R] textplot() in wordcloud package Dear All, The textplot() function in the wordcloud package seem to do a good job with generating non-overlapping labels on a scatter plot. But it throws "warnings" when I try to use the pos= parameter to position the text labels relative to a given x-y point. Here is a simple example: x<-runif(100) y<-runif(100) text1<- rep('LAB', 100) plot(x,y) textplot(x,y, text1, new=F, show.lines=F, pos=4) There were 50 or more warnings (use warnings() to see the first 50) > warnings() Warning messages: 1: In strwidth(words[i], cex = cex[i], ...) : "pos" is not a graphical parameter 2: In strheight(words[i], cex = cex[i], ...) : "pos" is not a graphical parameter How can I pass the pos=parameter to text() without generating the warnings? I am doubly puzzled by the warnings because in the graph that results from the foregoing code, The labels are to the right of the points, as 'pos=4' requests. Thanks! Fraser D. Neiman __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding Column to a Data Frame
The merge function combines 2, not 3 files at a time. Maybe rich.stats2 = merge(rich.stats, Month, by="X.SampleID") rich.stats3 = merge(rich.stats2, Location, by="X.SampleID") Reading the manual page will help: ?merge --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Lauren O'Connell Sent: Wednesday, March 11, 2015 11:02 AM To: r-help@r-project.org Subject: [R] Adding Column to a Data Frame I am trying to add a column with data to a data frame that already has information but am getting this error: > rich.stats4 = merge(rich.stats,Location,Month,by="X.SampleID") Error in fix.by(by.x, x) : 'by' must specify one or more columns as numbers, names or logical I have two separate data frames that contain my sample names with location and with month: #Create data frame of all sample names and month sample was taken Month = data.frame(X.SampleID=sample_data(Lauren5000)$X.SampleID,MonthSampleTaken=sample_data(Lauren5000)$MonthSampleTaken) head(Month) #Create data frame of all samples names and location of sample Location = data.frame(X.SampleID=sample_data(Lauren5000)$X.SampleID,Location=sample_data(Lauren5000)$Location) head(Location) I was able to add my "MonthSampleTaken" variable by using this command: >rich.stats2 = merge(rich.stats, Month,by="X.SampleID") > head(rich.stats2) X.SampleIDmean sd MonthSampleTaken 1 PE101 1421.34 19.44961 February 2 PE102 1336.24 25.43882 February 3 PE104 1418.75 21.92889March 4 PE105 1331.03 20.55712March 5 PE107 1320.21 20.91942March 6 PE108 1328.41 20.49247March I now want to add my sample site location, but can't figure out how to do this. Any help would be greatly appreciated. Cheers, Lauren [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculate value in dependence of target value
This works for your example data, but I'd recommend testing it carefully before using it. > dat <- data.frame(ID=11:14, VALUE=c(1, 5, 3, 2)*1) > HURD <- c(50, 75, 100)*1000 > PCT <- c(.02, .04, .08, .1) > dat$CVALUE <- cumsum(dat$VALUE) > dat$LVALUE <- dat$CVALUE - dat$VALUE > dat ID VALUE CVALUE LVALUE 1 11 1 1 0 2 12 5 6 1 3 13 3 9 6 4 14 2 11 9 > > for (idx in seq_len(nrow(dat))) { + rng <- sort(c(HURD, unlist(dat[idx,3:4]))) + a <- which(names(rng) == "LVALUE") + b <- which(names(rng) == "CVALUE") + diff(rng[a:b]) + ng <- length(diff(rng[a:b])) + dat$MARGE[idx] <- sum(PCT[a:(a+ng-1)]* diff(rng[a:b])) + } > dat ID VALUE CVALUE LVALUE MARGE 1 11 1 1 0 200 2 12 5 6 1 1200 3 13 3 9 6 1800 4 14 2 11 9 1800 - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: Jeff Newmiller [mailto:jdnew...@dcn.davis.ca.us] Sent: Monday, March 9, 2015 2:22 PM To: Matthias Weber Cc: David L Carlson; r-help@r-project.org Subject: Re: [R] calculate value in dependence of target value > target <- 10 > > breakpts <- data.frame( PctTarget=c(50,75,100,Inf), Mult=c(2,4,8,10) ) > breakpts$LastPct <- c( 0, breakpts$PctTarget[ -nrow( breakpts ) ] ) > breakpts$Range <- cut( breakpts$PctTarget, c( 0, breakpts$PctTarget ), include.lowest=TRUE ) > breakpts$DeltaPct <- with( breakpts, diff( c( 0, PctTarget ) ) ) > breakpts$CumMARGE <- target / 1e4 * with( breakpts, cumsum( DeltaPct * Mult ) ) > breakpts$LastCumMARGE <- c( 0, breakpts$CumMARGE[ -nrow( breakpts ) ] ) > > dta <- data.frame( ID=11:14, VALUE=c(1,5,3,2) ) > dta$CumVALUE <- cumsum( dta$VALUE ) > dta$CumPct <- 100 * dta$CumVALUE / target > dta$Range <- cut( dta$CumPct, c( 0, breakpts$PctTarget ), include.lowest=TRUE ) > > dta ID VALUE CumVALUE CumPct Range 1 11 11 10[0,50] 2 12 56 60 (50,75] 3 13 39 90 (75,100] 4 14 2 11110 (100,Inf] > breakpts PctTarget Mult LastPct Range DeltaPct CumMARGE LastCumMARGE 1502 0[0,50] 50 10000 2754 50 (50,75] 25 2000 1000 3 1008 75 (75,100] 25 4000 2000 4 Inf 10 100 (100,Inf] Inf Inf 4000 > > #dta2 <- merge( dta, breakpts, all.x=TRUE, by="Range" ) > #dta2 <- dta2[ order( dta2$ID ), ] > > dta2 <- cbind( dta, breakpts[ match( dta$Range, breakpts$Range ), -which( "Range"==names( breakpts ) ) ] ) > > dta2$CumMARGE <- with( dta2, Mult/100 * ( CumVALUE - target * LastPct / 100 ) + LastCumMARGE ) > dta2$MARGE <- with( dta2, diff( c( 0, CumMARGE ) ) ) > > dta2 ID VALUE CumVALUE CumPct Range PctTarget Mult LastPct DeltaPct CumMARGE LastCumMARGE MARGE 1 11 11 10[0,50]502 0 50 2000 200 2 12 56 60 (50,75]754 50 25 1400 1000 1200 3 13 39 90 (75,100] 1008 75 25 3200 2000 1800 4 14 2 11110 (100,Inf] Inf 10 100 Inf 5000 4000 1800 > > > target <- 10 > > breakpts <- data.frame( PctTarget=c(50,75,100,Inf), Mult=c(2,4,8,10) ) > breakpts$LastPct <- c( 0, breakpts$PctTarget[ -nrow( breakpts ) ] ) > breakpts$Range <- cut( breakpts$PctTarget, c( 0, breakpts$PctTarget ), include.lowest=TRUE ) > breakpts$DeltaPct <- with( breakpts, diff( c( 0, PctTarget ) ) ) > breakpts$CumMARGE <- target / 1e4 * with( breakpts, cumsum( DeltaPct * Mult ) ) > breakpts$LastCumMARGE <- c( 0, breakpts$CumMARGE[ -nrow( breakpts ) ] ) > > dta <- data.frame( ID=11:14, VALUE=c(1,5,3,2) ) > dta$CumVALUE <- cumsum( dta$VALUE ) > dta$CumPct <- 100 * dta$CumVALUE / target > dta$Range <- cut( dta$CumPct, c( 0, breakpts$PctTarget ), include.lowest=TRUE ) > > dta ID VALUE CumVALUE CumPct Range 1 11 11 10[0,50] 2 12 56 60 (50,75] 3 13 39 90 (75,100] 4 14 2 11110 (100,Inf] > breakpts PctTarget Mult LastPct Range DeltaPct CumMARGE LastCumMARGE 1502 0[0,50] 50 10000 2754 50 (50,75] 25 2000 1000 3 1008 75 (75,100] 25 4000 2000 4 Inf 10 100 (100,Inf] Inf Inf 4000 > > #dta2 <- merge( dta, breakpts, all.x=TRUE, by="Range&q
Re: [R] calculate value in dependence of target value
It is very hard to figure out what you are trying to do. 1. All of the VALUEs are greater than the target of 100 2. Your description of what you want does not match your example. Perhaps VALUE should be divided by 1000 (e.g. not 1, but 10)? Perhaps your targets do not apply to VALUE, but to cumulative VALUE? - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Matthias Weber Sent: Monday, March 9, 2015 7:46 AM To: r-help@r-project.org Subject: [R] calculate value in dependence of target value Hello together, i have a litte problem. Maybe anyone can help me. I have to calculate a new column in dependence of a target value. As a example: My target value is 100.000 At the moment I have a data.frame with the following values. IDVALUE 1 111 2 125 3 133 4 142 The new column ("MARGE") should be calculated with the following graduation: Until the VALUE reach 50% of the target value (50.000) = 2% Until the VALUE reach 75% of the target value (75.000) = 4% Until the VALUE reach 100% of the target value (<100.000) = 8% If the VALUE goes above 100% of the value (>100.000) = 10% The result looks like this one: IDVALUE MARGE 1 111 200 (result of 10.000 * 2%) 2 125 1200 (result of 40.000 * 2% + 10.000 * 4%) 3 133 1800 (result of 15.000 * 4% + 15.000 * 8%) 4 142 1800 (result of 10.000 * 8% + 10.000 * 10%) Is there anyway to calculate the column "MARGE" automatically in R? Thanks a lot for your help. Best regards. Mat This e-mail may contain trade secrets, privileged, undisclosed or otherwise confidential information. If you have received this e-mail in error, you are hereby notified that any review, copying or distribution of it is strictly prohibited. Please inform us immediately and destroy the original transmittal. Thank you for your cooperation. Diese E-Mail kann Betriebs- oder Geschaeftsgeheimnisse oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtuemlich erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine Vervielfaeltigung oder Weitergabe der E-Mail ausdruecklich untersagt. Bitte benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen Dank. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset a data frame by largest frequencies of factors
These two commands will compute the cell frequencies and then sort them: e <- as.data.frame(xtabs(~ctry+member, Dataset)) f <- e[order(e$Freq, decreasing=TRUE),] Then draw your subset g <- head(f, 10) or g <- f[cumsum(f$Freq)/sum(f$Freq) >.8,] Finally merge the sample with the original data and delete the unused factor levels: sample <- merge(Dataset, g[,-3]) sample$ctry <- factor(sample$ctry) sample$member <- factor(sample$member) --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Friendly Sent: Thursday, March 5, 2015 12:45 PM To: R-help Subject: [R] subset a data frame by largest frequencies of factors A consulting client has a large data set with a binary response (negative) and two factors (ctry and member) which have many levels, but many occur with very small frequencies. It is far too sparse with a model like glm(negative ~ ctry+member, family=binomial). > str(Dataset) 'data.frame': 10672 obs. of 5 variables: $ ctry: Factor w/ 31 levels "Barbados","Belize",..: 21 21 5 22 18 18 18 18 26 18 ... $ member : Factor w/ 163 levels "","ADHOPIA, PREETI ",..: 150 19 19 111 120 1 1 4 55 18 ... $ negative: int 0 1 0 1 1 1 1 0 0 0 ... > For analysis, we'd like to subset the data to include only those that occur with frequency greater than a given value, or the top 10 (say) in frequency, or the highest frequency categories accounting for 80% (say) of the total. I'm not sure how to do any of these in R. Can anyone help? -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. & Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele StreetWeb:http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using dates in R
Wow! A bold prediction from someone who has done exactly zero investigation of the basic, built-in date/time features in R. Since your example did not include the first two digits of the year, I've used %y instead of %Y. That will assume "19" precedes values from 69-99 and "20" precedes values from 00 to 68. If you decide to implement this with a for loop, it means you have much more to learn. > today <- "3/4/15" > d1 <- "2/15/80" > d2 <- "2/15/16" > # Is d before today, if so 0, otherwise 1 > as.integer(strptime(today, "%m/%d/%y") < strptime(d1, "%m/%d/%y")) [1] 0 > as.integer(strptime(today, "%m/%d/%y") < strptime(d2, "%m/%d/%y")) [1] 1 ?strptime for details - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Brian Hamel Sent: Wednesday, March 4, 2015 8:55 AM To: r-help@r-project.org Subject: [R] Using dates in R Hi all, I have a dataset that includes a "date" variable. Each observation includes a date in the form of 2/15/15, for example. I'm looking to create a new indicator variable that is based on the date variable. So, for example, if the date is earlier than today, I would need a "0" in the new column, and a "1" otherwise. Note that my dataset includes dates from 1979-2012, so it is not one-year (this means I can't easily create a new variable 1-365). How does R handle dates? My hunch is "not well," but perhaps there is a package that can help me with this. Let me know if you have any recommendations as to how this can be done relatively easily. Thanks! Appreciate it. Best, Brian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling dataframe based upon number of record occurrences
I'm not sure I understand, but I think you have a large data frame with records and you want to construct a sample of that data frame that includes no more than 3 records for each IDbyYear combination? You say there are 5589 unique combinations and your code uses a data frame called fitting_set. Assuming this is the data frame you are describing, your code will select all of the lines since fitting_set$IDbyYear[i] is always a vector of length 1. We need a reproducible example. The best way for you to give us that would be to copy the result of dput(head(fitting_set, 10)). It would look something like this plus the 6 other columns you mention except that I've added dta <- in front of structure() to create a data frame: dta <- structure(list(IDbyYear = c(42.24, 42.24, 42.24, 42.24, 42.24, 42.24, 45.32, 45.32, 45.36, 45.4, 45.4), SiteID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("A-Airport", "A-Bark Corral East"), class = "factor"), Year = c(2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2008L, 2008L, 2009L, 2010L, 2010L )), .Names = c("IDbyYear", "SiteID", "Year"), class = "data.frame", row.names = c(NA, -11L)) Now create a list of data frames, one for each IDbyYear: dta.list <- split(dta, dta$IDbyYear) Now a function that will select 3 rows or all of them if there are fewer: smp <- function(dframe) { ind <- seq_len(nrow(dframe)) dframe[sample(ind, ifelse(length(ind)>2, 3, length(ind))),] } Now take the samples and combine them into a single data frame: sample <- do.call(rbind, lapply(dta.list, smp)) sample - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Curtis Burkhalter Sent: Tuesday, March 3, 2015 3:23 PM To: r-help@r-project.org Subject: [R] sampling dataframe based upon number of record occurrences Hello everyone, I'm having trouble performing a task that is probably very simple, but can't seem to figure out how to get my code to work. What I want to do is use the sample function to pick records within in a dataframe, but only if a column attribute value is repeated more than 3 times. So if you look at the data below I have created a unique attribute value that corresponds to every site by year combination (i.e. IDxYear). So you can see that for the site called "A-Airport" it was sampled 6 times in 2006, "A-Bank Corral East" was sampled twice in 2008. So what I want to do is randomly select 3 records for "A-Airport" in 2006 for the existing 6 records, but for "A-Bark Corral East" in 2008 I just want to leave these records as they currently are. I've used the following code to try and accomplish this, but like I said I can't get it to work so I'm clearly doing something wrong. If you could check out the code and provide any suggestions that would be great. It should be noted that there are 5589 unique IDxYear combinations so that's why that number is in the code. If any further clarification is needed also let me know. boom=data.frame() for (i in 1:5589){ boom[i,]=ifelse(length(fitting_set$IDbyYear[i]>3),fitting_set[sample(nrow(fitting_set),3),],fitting_set) } boom *IDbyYear* *SiteID * *Year* *6 other column attributes* 42.24 A-Airport 2006 42.24 A-Airport 2006 42.24 A-Airport 2006 42.24 A-Airport 2006 42.24 A-Airport 2006 42.24 A-Airport 2006 45.32 A-Bark Corral East2008 45.32 A-Bark Corral East2008 45.36 A-Bark Corral East2009 45.40 A-Bark Corral East2010 45.40 A-Bark Corral East 2010 Thanks -- Curtis Burkhalter https://sites.google.com/site/curtisburkhalter/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2D Timeseries trace plot
You can do this with the animation package. Install the package and then # Load the package library(animation) # This representation makes your data more portable using the dput() function: pen <- structure(list(x = c(1073L, 1072L, 1066L, 1052L, 1030L, 1009L, 994L), y = c(1058L, 1085L, 1117L, 1152L, 1196L, 1242L, 1286L), time = c(769.05, 769.07, 769.08, 769.1, 769.12, 769.13, 769.14 )), .Names = c("x", "y", "time"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7")) # Compute the time between each step diftime <- diff(pen$time) # Draw a blank plot window using the ranges for x and y with(pen, plot(NA, xlim=c(min(x), max(x)), ylim=c(min(y), max(y)), xlab="", ylab="", axes=FALSE)) # Pause for a second ani.pause(1) # Draw the curve pausing between points. for(i in 1:6) { ani.pause(diftime[i]*10) # Multiply by ten to slow things down segments(pen$x[i], pen$y[i], pen$x[i+1], pen$y[i+1]) } - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of AjayT Sent: Tuesday, March 3, 2015 8:59 AM To: r-help@r-project.org Subject: [R] 2D Timeseries trace plot Hi, I've got a 2D timeseries of handwriting samples, xy time 1 1073 1058 769.05 2 1072 1085 769.07 3 1066 1117 769.08 4 1052 1152 769.10 5 1030 1196 769.12 6 1009 1242 769.13 7 994 1286 769.14 upto 500 I was just wondering how to plot this as an animation, so that the points join up as they are rendered in time. Basically showing how the person who generated the data writes. The time index is not regular and if possible I'd like to avoid padding the data with duplicate entries if this is avoidable. For example adding a duplicate of the first row, for a 'padded' time 769.06. Thanks alot for your help :) -- View this message in context: http://r.789695.n4.nabble.com/2D-Timeseries-trace-plot-tp4704127.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summing certain values within columns that satisfy a certain condition
Here is another approach > maxv <- apply(df, 2, max) # Get the column maximums > maxv0 <- ifelse(maxv == 0, -1, maxv) # Replace 0 maximums with -1 > Sum <- rowSums(sweep(df, 2, maxv0, "==")) > data.frame(df, Sum) A B C D Sum 1 0 1 0 7 1 2 0 2 0 7 1 3 0 3 0 7 1 4 0 4 0 7 1 5 0 1 0 0 0 6 0 0 0 0 0 7 0 0 0 0 0 8 0 0 0 0 0 9 0 0 1 5 0 10 0 5 1 5 0 11 0 4 1 5 0 12 0 8 4 7 3 13 0 0 3 0 0 14 0 0 3 4 0 15 0 0 3 4 0 16 0 0 0 5 0 17 0 2 0 6 0 18 0 0 4 0 1 19 0 0 4 0 1 20 0 0 4 0 1 ------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Don McKenzie Sent: Thursday, February 26, 2015 3:12 PM To: Kate Ignatius Cc: r-help Subject: Re: [R] Summing certain values within columns that satisfy a certain condition Kate — here is a transparent solution (tested but without NA treatment). Doubtless there are cleverer faster ones, which later posters will present. HTH # example with four columns and 20 rows nrows <- 20 A <- sample(c(1:100), nrows, replace=T) B <- sample(c(1:100), nrows, replace=T) C <- sample(c(1:100), nrows, replace=T) D <- sample(c(1:100), nrows, replace=T) locs <- c(c(1:nrows)[A==max(A)],c(1:nrows)[B==max(B)],c(1:nrows)[C==max(C)],c(1:nrows)[D==max(D)]) mat1 <- matrix(rep(0,4*nrows),nrows,4) for (i in 1:4) mat1[,i][locs[i]] <- 1 SUM <- rowSums(mat1) > On Feb 26, 2015, at 12:23 PM, Kate Ignatius wrote: > > Hi, > > Supposed I had a data frame like so: > > A B C D > 0 1 0 7 > 0 2 0 7 > 0 3 0 7 > 0 4 0 7 > 0 1 0 0 > 0 0 0 0 > 0 0 0 0 > 0 0 0 0 > 0 0 1 5 > 0 5 1 5 > 0 4 1 5 > 0 8 4 7 > 0 0 3 0 > 0 0 3 4 > 0 0 3 4 > 0 0 0 5 > 0 2 0 6 > 0 0 4 0 > 0 0 4 0 > 0 0 4 0 > > For each row, I want to count how many max column values appear to > adventurely get the following outcome, while ignoring zeros and N/As: > > A B C D Sum > 0 1 0 7 1 > 0 2 0 7 1 > 0 3 0 7 1 > 0 4 0 7 1 > 0 1 0 0 0 > 0 0 0 0 0 > 0 0 0 0 0 > 0 0 0 0 0 > 0 0 1 5 0 > 0 5 1 5 0 > 0 4 1 5 0 > 0 8 4 7 3 > 0 0 3 0 0 > 0 0 3 4 0 > 0 0 3 4 0 > 0 0 0 5 0 > 0 2 0 6 0 > 0 0 4 0 1 > 0 0 4 0 1 > 0 0 4 0 1 > > I've used the following code but it doesn't seem to work (my sum > column column is all 1s): > > (apply(df,1, function(x) (sum(x %in% c(pmax(x)) > > Is this code too simple? > > Thanks! > > K. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] One column listing on wide monitors too
Here are several ways: > a <- paste("String", 1:16) > a [1] "String 1" "String 2" "String 3" "String 4" "String 5" "String 6" [7] "String 7" "String 8" "String 9" "String 10" "String 11" "String 12" [13] "String 13" "String 14" "String 15" "String 16" > matrix(a, length(a)) [,1] [1,] "String 1" [2,] "String 2" . . . [15,] "String 15" [16,] "String 16" > t(t(a)) [,1] [1,] "String 1" [2,] "String 2" . . . [15,] "String 15" [16,] "String 16" > b <- a > dim(b) <- c(16, 1) > b [,1] [1,] "String 1" [2,] "String 2" . . . [15,] "String 15" [16,] "String 16" > cat(a, sep="\n") # But no numbering String 1 String 2 . . . String 15 String 16 - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of marekl Sent: Tuesday, February 24, 2015 2:09 PM To: r-help@r-project.org Subject: [R] One column listing on wide monitors too Hi, it is probably very basic question, but I can't get answer still. R shows listings in more columns on wider monitors. Like on this picture: http://i.imgur.com/GLF70r9.png Is there a way to set R to show listings like this, in one column only? [1] "String 1" [2] "String 2" [3] "String 3" ... [16] "String 16" Thank you -- View this message in context: http://r.789695.n4.nabble.com/One-column-listing-on-wide-monitors-too-tp4703781.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to Deploy a 'poLCA' Model?
Looking at package poLCA I see functions poLCA.predcell() and poLCA.table(). If these do not do what you want, you will need to be clearer and provide a reproducible example. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of sagnik chakravarty Sent: Monday, February 23, 2015 6:06 AM To: d...@votamatic.org Cc: r-help Subject: [R] How to Deploy a 'poLCA' Model? Hi Drew, I was working with 'poLCA' to fit latent-class model with covariates [formula: f=cbind(y1,y2,y3) ~ x1*x2*x3*x4]. The output contains a fit table with coefficients, t-value, std_error and P-value for different combinations of the covariates. Now if I want to deploy this model to a new dataset like we do for any other model with 'predict' function, how to proceed? I couldn't find any predict function described in the package documentation. Kindly help. Thanks, -- Regards, SAGNIK CHAKRAVARTY [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting Factor Pattern Matrix Similar to Proc Factor
The pattern matrix is easy to compute from the results of princomp(). First we need a reproducible example so we'll use the iris data set (use ?iris for details) that comes with R. > data(iris) > iris.pc <- princomp(iris[,-5], cor=TRUE) > print(iris.pc$loadings, cutoff=0) Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Sepal.Length 0.521 -0.377 0.720 0.261 Sepal.Width -0.269 -0.923 -0.244 -0.124 Petal.Length 0.580 -0.024 -0.142 -0.801 Petal.Width 0.565 -0.067 -0.634 0.524 Comp.1 Comp.2 Comp.3 Comp.4 SS loadings 1.00 1.00 1.00 1.00 Proportion Var 0.25 0.25 0.25 0.25 Cumulative Var 0.25 0.50 0.75 1.00 The object iris.pc is a list with 7 elements. One of those, iris.pc$loadings contains the standardized loadings so that the sum of the squared values in each column is 1. The default print method suppresses the printing of small loadings (< .1) so I've set cutoff=0 so we see them all. To get the pattern matrix we just need to multiple each of the columns by iris.pc$sdev (the square roots of the eigenvalues): > iris.pat <- sweep(iris.pc$loadings, 2, iris.pc$sdev, "*") > print(iris.pat, cutoff=0) Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Sepal.Length 0.890 -0.361 0.276 0.038 Sepal.Width -0.460 -0.883 -0.094 -0.018 Petal.Length 0.992 -0.023 -0.054 -0.115 Petal.Width 0.965 -0.064 -0.243 0.075 Comp.1 Comp.2 Comp.3 Comp.4 SS loadings 2.918 0.914 0.147 0.021 Proportion Var 0.730 0.229 0.037 0.005 Cumulative Var 0.730 0.958 0.995 1.000 > iris.pc$sdev^2 Comp.1 Comp.2 Comp.3 Comp.4 2.91849782 0.91403047 0.14675688 0.02071484 The sweep() function multiplies each column by its standard deviation. Now the sums of the squared values in each column sum to the eigenvalue. Alternatively, you can install the "psych" package which computes the pattern (structure) matrix directly: > library(psych) > iris.pca <- principal(iris[,-5], nfactors=4, rotate="none") > print(iris.pca$Structure, cutoff=0) - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Scott Colwell Sent: Monday, February 23, 2015 12:15 PM To: r-help@r-project.org Subject: [R] Extracting Factor Pattern Matrix Similar to Proc Factor Hello, I am fairly new to R and coming from SAS IML. I am rewriting one of my MC simulations in R and am stuck on extracting a factor pattern matrix as would be done in IML using Proc Factor. I have found the princomp() command and read through the manual but can't seem to figure out how to save the factor pattern matrix. I am waiting for the R for SAS Users book to arrive. What I would use in SAS IML to get at what I am looking for is: PROC FACTOR Data=MODELCOV15(TYPE=COV) NOBS=1 N=16 CORR OUTSTAT=FAC.FACOUT15; RUN; DATA FAC.PATTERN15; SET FAC.FACOUT15; IF _TYPE_='PATTERN'; DROP _TYPE_ _NAME_; RUN; Would any SAS IML to R converts be able to help me with this? Thanks, Scott Colwell, PhD -- View this message in context: http://r.789695.n4.nabble.com/Extracting-Factor-Pattern-Matrix-Similar-to-Proc-Factor-tp4703704.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting Factor Pattern Matrix Similar to Proc Factor
Function principal() in psych takes a correlation matrix so use cov2cor() to convert: library(psych) iris.pca <- principal(cov2cor(cov(iris[,-5])), nfactors=4, rotate="none") print(iris.pca$Structure, cutoff=0) David -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Scott Colwell Sent: Monday, February 23, 2015 3:34 PM To: r-help@r-project.org Subject: Re: [R] Extracting Factor Pattern Matrix Similar to Proc Factor Thanks David. What do you do when the input is a covariance matrix rather than a dataset? -- View this message in context: http://r.789695.n4.nabble.com/Extracting-Factor-Pattern-Matrix-Similar-to-Proc-Factor-tp4703704p4703719.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replacing 9999 and 999 values with NA
Just for the record, you do not need cbind(): wind <- data.frame(windSpeed,windDirec) Using cbind() does not create a problem as long as the columns are all numeric, but if your data frame contains a mixture of numeric, factor, and character columns, cbind() will mess things up. ----- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Alexandra Catena Sent: Monday, February 23, 2015 11:50 AM To: Frederic Ntirenganya Cc: r-help@r-project.org Subject: Re: [R] Replacing and 999 values with NA The command, data[data ==] <- NA, worked! Thank you! But just in case you wanted to know, I'm downloading the data and unzipping it through readLines. I then concatenate two columns ( wind speed and direction) from the unzipped data through cbind but I make it into a data frame. wind = data.frame(cbind(windSpeed,windDirec)) Thanks, Alexandra On Sat, Feb 21, 2015 at 10:38 PM, Frederic Ntirenganya wrote: > If you are reading the data frame using for instance read.csv, you can put > in the argument na.string ="". > Another way to do that is data[data ==] <- NA. > > It should be good to tell us how you are reading your dataset. > > On Feb 21, 2015 6:49 AM, "Jeff Newmiller" wrote: >> >> You did not say how you imported the data, but if you used one of the >> read.table variants (including read.csv) then you can use the na.strings >> argument as documented in the help file for read.table. >> >> Next time please read the posting guide, as there are some useful tips in >> there, such as posting using plain text (a setting in your email program) so >> we don't get garbled info from you, and providing a reproducible example. >> >> --- >> Jeff NewmillerThe . . Go >> Live... >> DCN:Basics: ##.#. ##.#. Live >> Go... >> Live: OO#.. Dead: OO#.. Playing >> Research Engineer (Solar/BatteriesO.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. >> rocks...1k >> >> --- >> Sent from my phone. Please excuse my brevity. >> >> On February 20, 2015 10:55:30 AM PST, Alexandra Catena >> wrote: >> >Hello All, >> > >> >I have a data frame of two columns for wind. The first column is for >> >wind >> >speed and the second wind direction. I'm trying to replace the >> >values >> >in the first column and the 999 values in the second column with NA. I >> >tried to use the function ltdl.fix.df but it doesn't seem to do >> >anything. >> > >> >> ltdl.fix.df(windMV, zero2na = FALSE, coded = 999) >> > >> > n = 9432 by p = 4 matrix checked, 0 NA(s) present >> > >> > 0 factor variable(s) present >> > >> > 5675 value(s) coded 999 set to NA >> > >> > 0 -ve value(s) set to +ve half the negative value >> > >> > >> >I have R version 3.1.1 >> > >> >Thanks, >> >Alexandra >> > >> > [[alternative HTML version deleted]] >> > >> >__ >> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >PLEASE do read the posting guide >> >http://www.R-project.org/posting-guide.html >> >and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correlation question
As Kehl pointed out, any linear function of the independent variable (speed) will have the same squared correlation with the dependent variable (dist), but only one linear function minimizes the squared deviations between the fitted values and the original values. The equation you are using is only applicable to that function, not to any of the others. In fact, some linear functions will produce negative values: > fitted.new <- 6*cars$speed > cor(cbind(fitted.new, fitted.right, fitted.wrong, cars$dist)) fitted.new fitted.right fitted.wrong fitted.new1.0001.0001.000 0.8068949 fitted.right 1.0001.0001.000 0.8068949 fitted.wrong 1.0001.0001.000 0.8068949 0.80689490.80689490.8068949 1.000 > 1-sum((cars$dist-fitted.new)^2)/sum((cars$dist-mean(cars$dist))^2) [1] -3.281849 David L. Carlson Department of Anthropology Texas A&M University -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Thayn Sent: Sunday, February 22, 2015 12:01 AM To: Kehl Dániel Cc: r-help@r-project.org Subject: Re: [R] Correlation question Of course! Thank you, I knew I was missing something painfully obvious. Its seems, then, that this line 1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2) is finding something other than the traditional correlation. I found this in a lecture introducing correlation, but , now, I'm not sure what it is. It does do a better job of showing that the fitted.wrong variable is not a good prediction of the distance. On Feb 21, 2015, at 4:36 PM, Kehl Dániel wrote: > Hi, > > try > > cor(fitted.right,fitted.wrong) > > should give 1 as both are a linear function of speed! Hence > cor(cars$dist,fitted.right)^2 and cor(x=cars$dist,y=fitted.wrong)^2 must be > the same. > > HTH > d > > Feladó: R-help [r-help-boun...@r-project.org] ; meghatalmazó: Jonathan > Thayn [jth...@ilstu.edu] > Küldve: 2015. február 21. 22:42 > To: r-help@r-project.org > Tárgy: [R] Correlation question > > I recently compared two different approaches to calculating the correlation > of two variables, and I cannot explain the different results: > > data(cars) > model <- lm(dist~speed,data=cars) > coef(model) > fitted.right <- model$fitted > fitted.wrong <- -17+5*cars$speed > > > When using the OLS fitted values, the lines below all return the same R2 > value: > > 1-sum((cars$dist-fitted.right)^2)/sum((cars$dist-mean(cars$dist))^2) > cor(cars$dist,fitted.right)^2 > (sum((cars$dist-mean(cars$dist))*(fitted.right-mean(fitted.right)))/(49*sd(cars$dist)*sd(fitted.right)))^2 > > > However, when I use my estimated parameters to find the fitted values, > "fitted.wrong", the first equation returns a much lower R2 value, which I > would expect since the fit is worse, but the other lines return the same R2 > that I get when using the OLS fitted values. > > 1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2) > cor(x=cars$dist,y=fitted.wrong)^2 > (sum((cars$dist-mean(cars$dist))*(fitted.wrong-mean(fitted.wrong)))/(49*sd(cars$dist)*sd(fitted.wrong)))^2 > > > I'm sure I'm missing something simple, but can someone explain the difference > between these two methods of finding R2? Thanks. > > Jon >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-square test
And probably why chisq.test has the rescale.p= argument. Your second problem with small expected values can be handled with simulate.p.value=. > chisq.test(f, p=p11) Error in chisq.test(f, p = p11) : probabilities must sum to 1. > 1-sum(p11) [1] 4.3036e-08 > chisq.test(f, p=p11, rescale.p=TRUE) Chi-squared test for given probabilities data: f X-squared = 7.6268, df = 14, p-value = 0.9078 Warning message: In chisq.test(f, p = p11, rescale.p = TRUE) : Chi-squared approximation may be incorrect > chisq.test(f, p=p11, rescale.p=TRUE, simulate.p.value=TRUE) Chi-squared test for given probabilities with simulated p-value (based on 2000 replicates) data: f X-squared = 7.6268, df = NA, p-value = 0.7996 --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Berend Hasselman Sent: Friday, February 20, 2015 12:13 PM To: pari hesabi Cc: r-help@r-project.org Subject: Re: [R] Chi-square test > On 20-02-2015, at 19:05, pari hesabi wrote: > > Hello, > If the vector of observed frequencies is: > f<-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5) > and the vector of probability :p11<-c(7.577864e-06, 1.999541e-04 > ,1.833510e-03, 9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, > 1.525880e-01, 1.689712e-01, 1.563522e-01, 1.232031e-01, 8.395000e-02, > 5.009534e-02, 2.645857e-02,0.0205403) > The sum of the probabilities is equal to one. But when I want to do the the > Chi-square test, I get this error: probabilities must sum to one. print sum(p11)-1 > Does anybody know the reason? R FAQ 7.31 (http://cran.r-project.org/doc/FAQ/R-FAQ.html) Berend > Best Regards, > pari > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsamples and regressions for 100 times
Expanding a bit on Michael's answer, you don't need the sampling package for this, just the sample.int() function to draw a random set of integers that you will use to extract rows from each of your groups. The write a function that returns what you want, the regression slopes from each group and use that function with the replicate() function. Your problem is a good way to illustrate the lapply(), sapply(), replicate() family of functions in R: # Split the data into a list of data frames datlist <- split(dat, dat$L_group) # Write a function to draw the sample and perform the regression on each group slopes <- function(lst) { # Get the minimum sample size minsize <- min(sapply(lst, nrow)) # Draw sample (row numbers) of size minsize from each group samlist <- lapply(sapply(lst, nrow), sample.int, size=minsize) # Extract sample from each group samples <- lapply(names(lst), function(x) lst[[x]][samlist[[x]],]) # Run the regressions for each group and extract the slopes results <- sapply(samples, function(x) coef(lm(co2~temp, x))[2]) # Use the group names to label the slopes names(results) <- names(datlist) return(results) } # You can get a single set of results with (results <- slopes(datlist)) # A B C # 1.0128392 0.2658041 1.3423786 # To get 100 runs many <- t(replicate(100, slopes(datlist))) head(many) # A BC # [1,] 1.4326103 0.2658041 1.357475 # [2,] 1.4754324 0.2658041 1.309208 # [3,] 0.9838589 0.2658041 1.408987 # [4,] 0.9993144 0.2658041 1.354297 # [5,] 1.0134187 0.2658041 1.397112 # [6,] 1.4922856 0.2658041 1.312531 > --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Dewey Sent: Tuesday, February 17, 2015 9:52 AM To: Angela Smith; r-help@r-project.org Subject: Re: [R] subsamples and regressions for 100 times Comment inline On 17/02/2015 12:40, Angela Smith wrote: > > > Hi R user, > I'm new to R so > my problem is probably pretty simple but I'm stuck: > > > > my data is consist of 2 variables: co2, temp and one > treatment (l_group). The sample size is different among the treatments. so > > that, I wanted to make equal sample size among three groups (A,B and C) of the > treatment. > Not sure whether that is necessary for regression but you did not tell us why you want to do that. > For this one, I used subsamples technique. Using > subsample, each time the data are different among the three groups of the > treatment. > > so that I want to run regression (co2~temp) for a 100 > subsamples for each group of treatment (100 times subsample). > The usual way to do this is to store the subsamples in a list and then write a function and use lapply, say to store your models. You then have another list to which you can then apply the extractor function of your choice. > it means that I will have 100 regression equations. Later, I want to compare > the slope of the > regression among the three groups. is there simple way to make a loop so that > I > can compare it? > > Thanks in advance! > > > > Angela > > > Here is the example: > > dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23, > 0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34, > 0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119, > 0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397, > 0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112, > 0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), > .Names = c("co2", > "temp", "L_group"), class = "data.frame", row.names = c(NA, -24L > )) > > head(dat) > library(sampling) > > # strata.sampling - > strata.sampling <- function(data, group,size, method = NULL) { > require(sampling) >if (is.null(method)) method <- "srswor" >temp <- data[order(data[[group]]), ] >ifelse(length(size)> 1, > size <- size, > ifelse(size < 1, > size <- round(table(temp[group]) * size), > size <- rep(size, times=length(table(temp[group]) >strat = strata(temp, stratanames = names(temp[group]), > size = size, method = method) >getdata(temp, strat) > } > > #--
Re: [R] Picking Best Discriminant Function Variables
Look at the function stepclass() in package klaR. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David Moskowitz Sent: Sunday, February 15, 2015 11:34 AM To: n omranian via R-help Subject: [R] Picking Best Discriminant Function Variables Is there a way to have the LDA function give me the best 3 (or 4) predictor variables. When I put in all the variables, LDA uses all the variables, but I would like to know what would be the 3 (or 4) best to use out all the available variables and the coefficients for those. Here is the code I am using for Linear Discriminant Function library("MASS") results <- lda(data$V1 ~ data$V2 + data$V3 + data$V4 + data$V5 + data$V6 + data$V7 + data$V8 + data$V9 + data$V10 + data$V11 + data$V12 + data$V13 + data$V14) Output: Coefficients of linear discriminants: LD1 LD2 data$V2 -0.4033997810.8717930699 data$V3 0.165254596 0.3053797325 data$V4 -0.3690752562.3458497486 data$V5 0.154797889 -0.1463807654 data$V6 -0.002163496-0.0004627565 data$V7 0.618052068 -0.0322128171 data$V8 -1.661191235 -0.4919980543 data$V9 -1.495818440 -1.6309537953 data$V10 0.134092628 -0.3070875776 data$V11 0.3550557100.2532306865 data$V12 -0.818036073-1.5156344987 data$V13 -1.1575593760.0511839665 data$V14 -0.0026912060.0028529846 So in the above example, I would like the LDA to return to me the 3 best predictors out of the 13 available. Thank you [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop over regression results
Or for the slopes and t-values: > do.call(rbind, lapply(mod, function(x) summary(x)[["coefficients"]][2,])) Estimate Std. Error t value Pr(>|t|) setosa 0.8371922 0.5049134 1.658091 1.038211e-01 versicolor 1.0536478 0.1712595 6.152348 1.41e-07 virginica 0.6314052 0.1428938 4.418702 5.647610e-05 David C -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson Sent: Monday, February 16, 2015 8:52 AM To: Ronald Kölpin; r-help@r-project.org Subject: Re: [R] Loop over regression results In R you would want to combine the results into a list. This could be done when you create the regressions or afterwards. To repeat your example using a list: data(iris) taxon <- levels(iris$Species) mod <- lapply(taxon, function (x) lm(Sepal.Width ~ Petal.Width, data=iris, subset=Species==x)) names(mod) <- taxon lapply(mod, summary) coeffs <- do.call(rbind, lapply(mod, coef, "[1")) coeffs # (Intercept) Petal.Width # setosa3.222051 0.8371922 # versicolor1.372863 1.0536478 # virginica 1.694773 0.6314052 --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ronald Kölpin Sent: Monday, February 16, 2015 7:37 AM To: r-help@r-project.org Subject: [R] Loop over regression results -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear all, I have a problem when trying to present the results of several regression. Say I have run several regressions on a dataset and saved the different results (as in the mini example below). I then want to loop over the regression results in order so save certain values to a matrix (in order to put them into a paper or presentation). Aside from the question of how to access certain information stored by lm() (or printed by summary()) I can't seem to so loop over lm() objects -- no matter whether they are stored in a vector or a list. They are always evaluated immediately when called. I tried quote() or substitute() but that didn't work either as "Objects of type 'symbol' cannot be indexed." In Stata I would simply do something like forvalues k = 1/3 { quietly estimates restore mod`k' // [...] } and I am looking for the R equivalent of that syntax. Kind regard and thanks RK attach(iris) mod1 <- lm(Sepal.Width ~ Petal.Width, data=iris, subset=Species=="setosa") mod2 <- lm(Sepal.Width ~ Petal.Width, data=iris, subset=Species=="versicolor") mod3 <- lm(Sepal.Width ~ Petal.Width, data=iris, subset=Species=="virginica") summary(mod1); summary(mod2); summary(mod3) mat <- matrix(data=NA, nrow=3, ncol=5, dimnames=list(1:3, c("Model", "Intercept", "p(T > |T|)", "Slope", "R^2"))) mods <- c(mod1, mod2, mod3) for(k in 1:3) { mod <- mods[k] mat[2,k] <- as.numeric(coef(mod))[1] mat[3,k] <- as.numeric(coef(mod))[1] } -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJU4fJnAAoJEKdHe5EUSrVeafwIALerOj+rsZTnbSKOUX6vYpr4 Uqsx0X2g+IgJw0KLdyqnlDmOut4wW6sWExtVgiugo/bkN8g5rDotGAl06d0UYRQV 17aLQqQjI6EGXKV9swwlm2DBphtXCIYUCXnDWUoG4Y2wC/4hDnaLbZ9yJFF1GSjn +aN/PFf1mPPZLvF1NgMmzLdszP76VYzEgcOcEUfbmB7RU/2WEBLeBYJ8+FD1utPJ cnh03rSc/0dgvphP8FO47Nj7mbqqhKL76a9oQqJSJiZJoCFCGiDIIgzq7vwGWc4T 9apwC/R3ahciB18yYOSMq7ZkVdQ+OpsqDTodnnIIUZjrVIcn9AI+GE0eq1VdLSE= =x+gM -END PGP SIGNATURE- __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop over regression results
In R you would want to combine the results into a list. This could be done when you create the regressions or afterwards. To repeat your example using a list: data(iris) taxon <- levels(iris$Species) mod <- lapply(taxon, function (x) lm(Sepal.Width ~ Petal.Width, data=iris, subset=Species==x)) names(mod) <- taxon lapply(mod, summary) coeffs <- do.call(rbind, lapply(mod, coef, "[1")) coeffs # (Intercept) Petal.Width # setosa3.222051 0.8371922 # versicolor1.372863 1.0536478 # virginica 1.694773 0.6314052 --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ronald Kölpin Sent: Monday, February 16, 2015 7:37 AM To: r-help@r-project.org Subject: [R] Loop over regression results -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear all, I have a problem when trying to present the results of several regression. Say I have run several regressions on a dataset and saved the different results (as in the mini example below). I then want to loop over the regression results in order so save certain values to a matrix (in order to put them into a paper or presentation). Aside from the question of how to access certain information stored by lm() (or printed by summary()) I can't seem to so loop over lm() objects -- no matter whether they are stored in a vector or a list. They are always evaluated immediately when called. I tried quote() or substitute() but that didn't work either as "Objects of type 'symbol' cannot be indexed." In Stata I would simply do something like forvalues k = 1/3 { quietly estimates restore mod`k' // [...] } and I am looking for the R equivalent of that syntax. Kind regard and thanks RK attach(iris) mod1 <- lm(Sepal.Width ~ Petal.Width, data=iris, subset=Species=="setosa") mod2 <- lm(Sepal.Width ~ Petal.Width, data=iris, subset=Species=="versicolor") mod3 <- lm(Sepal.Width ~ Petal.Width, data=iris, subset=Species=="virginica") summary(mod1); summary(mod2); summary(mod3) mat <- matrix(data=NA, nrow=3, ncol=5, dimnames=list(1:3, c("Model", "Intercept", "p(T > |T|)", "Slope", "R^2"))) mods <- c(mod1, mod2, mod3) for(k in 1:3) { mod <- mods[k] mat[2,k] <- as.numeric(coef(mod))[1] mat[3,k] <- as.numeric(coef(mod))[1] } -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJU4fJnAAoJEKdHe5EUSrVeafwIALerOj+rsZTnbSKOUX6vYpr4 Uqsx0X2g+IgJw0KLdyqnlDmOut4wW6sWExtVgiugo/bkN8g5rDotGAl06d0UYRQV 17aLQqQjI6EGXKV9swwlm2DBphtXCIYUCXnDWUoG4Y2wC/4hDnaLbZ9yJFF1GSjn +aN/PFf1mPPZLvF1NgMmzLdszP76VYzEgcOcEUfbmB7RU/2WEBLeBYJ8+FD1utPJ cnh03rSc/0dgvphP8FO47Nj7mbqqhKL76a9oQqJSJiZJoCFCGiDIIgzq7vwGWc4T 9apwC/R3ahciB18yYOSMq7ZkVdQ+OpsqDTodnnIIUZjrVIcn9AI+GE0eq1VdLSE= =x+gM -END PGP SIGNATURE- __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coordinate or top left corner + offset
Thanks, I didn't know about corner.label. I started with legend but I couldn't find a way to make the box small enough. It always covered much more of the corner than the letter which could have obscured data points. David -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ben Bolker Sent: Monday, February 9, 2015 5:43 PM To: r-h...@stat.math.ethz.ch Subject: Re: [R] Coordinate or top left corner + offset David L Carlson tamu.edu> writes: > > This is more complicated, but it could be rolled up into a function. Replace your mtext() call with the following: > > # Set character expansion size > cx <- 2.5 > # Get the plot coordinates and the character size > ur <- par("usr")[c(1, 4)] > chr <- par("cxy") > rect(ur[1]+chr[1]/10, ur[2]-chr[2]*cx, ur[1]+chr[1]*cx, ur[2]-chr[1]/10, > border=NA, col="white") > text(ur[1]+chr[1]*cx/2, ur[2]-chr[2]*cx/2, "a", font=2, cex=2.5, col="red") > > 1) Assign to cx the cex= value that you are using in text(). > 2) Then get the upper right corner of the plot window and the size of the default character width in user > coordinate units. > 3) Draw a white rectangle the size of the character you are plotting (in this case cex=2.5). Shrink the left > and top edge so that the box around the plot area is not obscured. > 4) Plot your character in the center of the box. > There are two more tricks you can use here: (1) cheat by using legend() plot(0:10,0:10) legend("topleft",legend=NA,title="hello",bty="n") (2) use plotrix::corner.label __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variance is different in R vs. Excel?
Time for a new version of Excel? I cannot duplicate your results in Excel 2013. R: > apply(dat, 2, var) [1] 21290.80 24748.75 Excel 2013: =VAR.S(A2:A21) =VAR.S(B2:B21) 21290.8 24748.74737 ----- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Karl Fetter Sent: Monday, February 9, 2015 3:33 PM To: r-help@r-project.org Subject: [R] Variance is different in R vs. Excel? Hello everyone, I have a simple question. when I use the var() function in R to find a variance, it differs greatly from the variance found in excel using the =VAR.S function. Any explanations on what those two functions are actually doing? Here is the data and the results: dat<-matrix(c(402,908,553,522,627,1040,756,679,806,711,713,734,683,790,597,872,476,1026,423,476,419,591,376,640,550,601,588,499,646,693,351,730,632,707,779,838,814,771,533,818), nrow=20, ncol=2, byrow=T) var(dat[,1]) #21290.8 var(dat[,2]) #24748.75 #in Excel, the variance of dat[,1] = 44763.91; for dat[,2] = 52034.2 Thanks, Karl [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coordinate or top left corner + offset
This is more complicated, but it could be rolled up into a function. Replace your mtext() call with the following: # Set character expansion size cx <- 2.5 # Get the plot coordinates and the character size ur <- par("usr")[c(1, 4)] chr <- par("cxy") rect(ur[1]+chr[1]/10, ur[2]-chr[2]*cx, ur[1]+chr[1]*cx, ur[2]-chr[1]/10, border=NA, col="white") text(ur[1]+chr[1]*cx/2, ur[2]-chr[2]*cx/2, "a", font=2, cex=2.5, col="red") 1) Assign to cx the cex= value that you are using in text(). 2) Then get the upper right corner of the plot window and the size of the default character width in user coordinate units. 3) Draw a white rectangle the size of the character you are plotting (in this case cex=2.5). Shrink the left and top edge so that the box around the plot area is not obscured. 4) Plot your character in the center of the box. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Pascal A. Niklaus Sent: Monday, February 9, 2015 10:27 AM To: r-help@r-project.org Subject: [R] Coordinate or top left corner + offset Dear all, I am struggling to add annotations to panels of a series of plots arranged on a page. Basically, I'd like to add letters enumerating the panels ("a","b","c",...), at a fixed distance from the top left corner of the plot's "box". I succeeded partly with "mtext" (see below), but the "at" option is in user coordinates, which makes is difficult to specify a given offset from the corner (e.g. 1cm from top and left). I tried grid's "npc" but these coordinates refer to the entire plot instead of the current inner plotting region. Phrased differently, I'd like to place text (and ideally also be able to plot, e.g. a white disc to cover background items) at position (top-1cm,left+1cm) Here is a minimum working example illustrating what I try to achieve: pdf("example.pdf",width=15,height=15) m <- rbind( c(0.1,0.9,0.1,0.6), c(0.1,0.9,0.6,0.9) ); split.screen(m) screen(1); par(mar=c(0,0,0,0)); plot(rnorm(10),rnorm(10),xlim=c(-5,5),xaxt="n",yaxt="n"); mtext(quote(bold(a)),side=3,line=-2.5,at=-5,cex=2.5) screen(2); par(mar=c(0,0,0,0)); plot(rnorm(10),rnorm(10),xlim=c(-3,3),xaxt="n",yaxt="n"); mtext(quote(bold(a)),side=3,line=-2.5,at=-3,cex=2.5) close.screen(all.screens=TRUE) dev.off() Thanks for your help Pascal Niklaus __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Still trying to avoid loops
How about? > ave(dat$D, dat$S, FUN=order) [1] 2 1 1 1 2 3 > ave(dat_2$D, dat_2$S, FUN=order) [1] 2 2 1 1 1 3 Note, your answer for the second example is incorrect since row 2 (c, 3) and row 5 (c, 2) are both assigned 2. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Tom Wright Sent: Wednesday, February 4, 2015 2:08 PM To: Rui Barradas Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] Still trying to avoid loops Thanks, I was not aware of order(). I did deliberately mess up the order of S. The following example breaks your solution dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')), D=c(5,3,1,3,2,4)) which should give the answer c(2,2,1,1,2,3) Your solution does indicate that sorting the data correctly before starting might solve the problem. On Wed, 2015-02-04 at 19:49 +, Rui Barradas wrote: > Hello, > > Aren't the levels of your example wrong? If the levels are > levels=c('a','b','c'), not c('b', 'a', 'c'), then the following will do > the job. > > unname(unlist(tapply(dat$D, dat$S, order))) > > > Hope this helps, > > Rui Barradas > > Em 04-02-2015 19:34, Tom Wright escreveu: > > Given a dataframe: > > dat<-data.frame(S=factor(c('a','b','a','c','c','c',levels=c('b','a','c')), > > D=c(1,5,3,2,3,4)) > > > > where S is a subject identifier and D a visit (actually a date in my > > real dataset). I would like to generate another column giving the visit > > number > > > > R=c(2,1,1,1,2,3) > > > > My current solution uses nested loops and is slow and ugly. I've looked > > at by() but can't see how to keep the order of R correct. > > > > Thanks, > > Tom > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] naming rows/columns in 'array of matrices' | solved
You can also add names to the dimensions: > dimnames(P)[[1]] <- c("live","dead") > dimnames(P)[[2]] <- c("old","young") > names(dimnames(P)) <- c("status", "age", NULL) > P , , 1 age status old young live 1 2 dead 3 4 , , 2 age status old young live 5 6 dead 7 8 David L. Carlson Department of Anthropology Texas A&M University -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of peter dalgaard Sent: Saturday, January 31, 2015 2:19 AM To: Evan Cooch Cc: r-help@r-project.org Subject: Re: [R] naming rows/columns in 'array of matrices' | solved > On 30 Jan 2015, at 20:34 , Evan Cooch wrote: > > The (obvious, after the fact) solution at the bottom. D'oh... > [snip] > Forgot I was dealing with a multi-dimensional array, not a list. So, > following works fine. I'm sure there are better approaches (where 'better' is > either 'cooler', or 'more flexible'), but for the moment...) > > P <- array(0, c(2,2,2),dimnames=list(c("live","dead"),c("old","young"),NULL)) > > P[,,1] <- matrix(c(1,2,3,4),2,2,byrow=T); > P[,,2] <- matrix(c(5,6,7,8),2,2,byrow=T); > > print(P); > Just for completeness, this also works: > P <- array(0, c(2,2,2)) > P[,,1] <- matrix(c(1,2,3,4),2,2,byrow=T); > P[,,2] <- matrix(c(5,6,7,8),2,2,byrow=T); > dimnames(P)[[1]] <- c("live","dead") > dimnames(P)[[2]] <- c("live","dead") -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Your personal email on the R-help mail list
Yes. I thought I was replying to a different message. Sorry. David -Original Message- From: Chel Hee Lee [mailto:chl...@mail.usask.ca] Sent: Thursday, January 29, 2015 11:33 AM To: David L Carlson Subject: Your personal email on the R-help mail list Hi David, I am not sure if you noticed that your personal conversation is on the R-help mailing list. Chel Hee Lee, PhD Biostatistician and Manager Clinical Research Support Unit College of Medicine University of Saskatchewan Canada On 1/29/2015 11:28 AM, David L Carlson wrote: > That's fine, but I'm here in town if you want me to pick her up at the > airport. > > David > > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Chel Hee Lee > Sent: Thursday, January 29, 2015 9:18 AM > To: Jeff Newmiller; Alan Yong; r-help@r-project.org > Subject: Re: [R] Passing a Data Frame Name as a Variable in a Function > > I like Jeff's comments on the previous post. > > Regarding Alan's question, please see the following example. > > > df.1 <- data.frame(v1=1:5, v2=letters[1:5]) > > df.2 <- data.frame(v1=LETTERS[1:3], v2=11:13) > > DFName <- ls(pattern = glob2rx("df.*"))[1] > > DFName > [1] "df.1" > > length(DFName[,1]) > Error in DFName[, 1] : incorrect number of dimensions > > 'DFName' is a character vector of length 1 (it is neither a matrix nor a > data frame). In this case, you may try 'eval()' as below: > > > eval(parse(text=DFName)) > v1 v2 > 1 1 a > 2 2 b > 3 3 c > 4 4 d > 5 5 e > > eval(parse(text=DFName))[,1] > [1] 1 2 3 4 5 > > length(eval(parse(text=DFName))[,1]) > [1] 5 > > > > Is this what you are looking for? I hope this helps. > > Chel Hee Lee > > > On 1/29/2015 12:34 AM, Jeff Newmiller wrote: >> This approach is fraught with dangers. >> >> I recommend that you put all of those data frames into a list and have your >> function accept the list and the name and use the list indexing operator >> mylist[[DFName]] to refer to it. Having functions that go fishing around in >> the global environment will be hard to maintain at best, and buggy at worst. >> >> That said, I usually work with all of my data frames combined as one and use >> the plyr, dplyr, or data.table packages to apply my algorithms to each group >> of rows identified by a character or factor column. >> --- >> Jeff NewmillerThe . . Go Live... >> DCN:Basics: ##.#. ##.#. Live Go... >> Live: OO#.. Dead: OO#.. Playing >> Research Engineer (Solar/BatteriesO.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k >> --- >> Sent from my phone. Please excuse my brevity. >> >> On January 28, 2015 5:37:34 PM PST, Alan Yong wrote: >>> Dear R-help, >>> I have df.1001 as a data frame with rows & columns of values. >>> >>> I also have other data frames named similarly, i.e., df.*. >>> >>> I used DFName from: >>> >>> DFName <- ls(pattern = glob2rx("df.*"))[1] >>> >>> & would like to pass on DFName to another function, like: >>> >>> length(DFName[, 1]) >>> >>> however, when I run: >>> >>>> length(DFName[, 1]) >>> Error in DFName[, 1] : incorrect number of dimensions >>> >>> and >>> >>> length(df.1001[, 1]) >>> [1] 104 >>> >>> do not provide the same expected answer. >>> >>> How can I successfully pass the data frame name of df.1001 as a >>> variable named DFName in a function? >>> >>> Thanks, >>> Alan >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Passing a Data Frame Name as a Variable in a Function
That's fine, but I'm here in town if you want me to pick her up at the airport. David -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Chel Hee Lee Sent: Thursday, January 29, 2015 9:18 AM To: Jeff Newmiller; Alan Yong; r-help@r-project.org Subject: Re: [R] Passing a Data Frame Name as a Variable in a Function I like Jeff's comments on the previous post. Regarding Alan's question, please see the following example. > df.1 <- data.frame(v1=1:5, v2=letters[1:5]) > df.2 <- data.frame(v1=LETTERS[1:3], v2=11:13) > DFName <- ls(pattern = glob2rx("df.*"))[1] > DFName [1] "df.1" > length(DFName[,1]) Error in DFName[, 1] : incorrect number of dimensions 'DFName' is a character vector of length 1 (it is neither a matrix nor a data frame). In this case, you may try 'eval()' as below: > eval(parse(text=DFName)) v1 v2 1 1 a 2 2 b 3 3 c 4 4 d 5 5 e > eval(parse(text=DFName))[,1] [1] 1 2 3 4 5 > length(eval(parse(text=DFName))[,1]) [1] 5 > Is this what you are looking for? I hope this helps. Chel Hee Lee On 1/29/2015 12:34 AM, Jeff Newmiller wrote: > This approach is fraught with dangers. > > I recommend that you put all of those data frames into a list and have your > function accept the list and the name and use the list indexing operator > mylist[[DFName]] to refer to it. Having functions that go fishing around in > the global environment will be hard to maintain at best, and buggy at worst. > > That said, I usually work with all of my data frames combined as one and use > the plyr, dplyr, or data.table packages to apply my algorithms to each group > of rows identified by a character or factor column. > --- > Jeff NewmillerThe . . Go Live... > DCN:Basics: ##.#. ##.#. Live Go... >Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/BatteriesO.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --- > Sent from my phone. Please excuse my brevity. > > On January 28, 2015 5:37:34 PM PST, Alan Yong wrote: >> Dear R-help, >> I have df.1001 as a data frame with rows & columns of values. >> >> I also have other data frames named similarly, i.e., df.*. >> >> I used DFName from: >> >> DFName <- ls(pattern = glob2rx("df.*"))[1] >> >> & would like to pass on DFName to another function, like: >> >> length(DFName[, 1]) >> >> however, when I run: >> >>> length(DFName[, 1]) >> Error in DFName[, 1] : incorrect number of dimensions >> >> and >> >> length(df.1001[, 1]) >> [1] 104 >> >> do not provide the same expected answer. >> >> How can I successfully pass the data frame name of df.1001 as a >> variable named DFName in a function? >> >> Thanks, >> Alan >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Working with < and > is data sets
Here is one way to fix the data: # First note that "value" is a factor so we need to convert it to character > str(zp) 'data.frame': 20 obs. of 2 variables: $ variable: Factor w/ 5 levels "ZP.1","ZP.3",..: 1 1 1 1 2 2 2 2 3 3 ... $ value : Factor w/ 19 levels "<0.030","<1.2",..: 3 4 2 1 7 8 6 5 12 11 ... > zp$value <- as.character(zp$value) > str(zp) 'data.frame': 20 obs. of 2 variables: $ variable: Factor w/ 5 levels "ZP.1","ZP.3",..: 1 1 1 1 2 2 2 2 3 3 ... $ value : chr "1160" "27.3" "<1.2" "<0.030" ... # Next we need to see which values are preceded by "<", and record that in # a new variable, "note" > zp$note <- ifelse(grepl("<", zp$value), "Limit", "Measured") # Finally we strip the "<" off and convert "value" to numeric > zp$value <- as.numeric(gsub("<", "", zp$value)) > str(zp) 'data.frame': 20 obs. of 3 variables: $ variable: Factor w/ 5 levels "ZP.1","ZP.3",..: 1 1 1 1 2 2 2 2 3 3 ... $ value : num 1160 27.3 1.2 0.03 1870 45.7 0.85 0.025 695 31.9 ... $ note : chr "Measured" "Measured" "Limit" "Limit" ... > head(zp) variable value note 1 ZP.1 1160.00 Measured 2 ZP.1 27.30 Measured 3 ZP.11.20Limit 4 ZP.10.03Limit 5 ZP.3 1870.00 Measured 6 ZP.3 45.70 Measured - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Sam Albers Sent: Monday, January 26, 2015 12:41 PM To: r-help@r-project.org Subject: [R] Working with < and > is data sets Hello, I am having some trouble figuring out how to deal with data that has some observations that are detection limits and others that are integers denoted by greater and less than symbols. Ideally I would like a column that has the data as numbers then another column with values "Measured" or "Limit" or something like that. Data and further clarification below. ##Data zp<-structure(list(variable = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), .Label = c("ZP.1", "ZP.3", "ZP.5", "ZP.7", "ZP.9"), class = "factor"), value = structure(c(3L, 4L, 2L, 1L, 7L, 8L, 6L, 5L, 12L, 11L, 10L, 9L, 15L, 16L, 14L, 13L, 19L, 18L, 17L, 9L), .Label = c("<0.030", "<1.2", "1160", "27.3", "<0.025", "<0.85", "1870", "45.7", "<0.0020", "<0.050", "31.9", "695", "<0.0060", "<0.20", "311", "8.84", "<0.090", "12", "646"), class = "factor")), .Names = c("variable", "value"), row.names = c(NA, -20L), class = "data.frame") ## As expected converting everything to numeric results is a slew of NA values zp$valuefactor<-as.numeric(as.character(zp$value)) ## At this point I am unsure how to proceed. zp ### So I am just wondering how folks deal with this type of data. Any advice would be much appreciated as I am looking for something that will reliably works on a large data set. Thanks in advance! Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Complex merging problems
I think the OP does not want to list duplicate records. Perhaps > merge(unique(df1), df2, all.y=TRUE) v1 v2 ind 1 1 83 1 2 1 84 1 3 2 83 NA 4 2 84 NA 5 3 83 NA 6 3 84 NA 7 4 83 NA 8 4 84 NA ----- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of PIKAL Petr Sent: Tuesday, January 13, 2015 2:14 AM To: npretnar; r-help@r-project.org Subject: Re: [R] Complex merging problems Hi I do not understand what you want to achive with this. > df2$v3 <- ifelse(df2$v1 %in% df1$v1 & df2$v2==df2$v1, 1, 0). You compare v1 and v2 from data frame df2 to column v1 in data frame df1? It is true only in case where df2$v1 equals df2$v2. In case you mean that you want check equality of rows in both data frames you can use this > df1$ind<-1 > merge(df1, df2, all.y=T) v1 v2 ind 1 1 83 1 2 1 83 1 3 1 84 1 4 1 84 1 5 2 83 NA 6 2 84 NA 7 3 83 NA 8 3 84 NA 9 4 83 NA 10 4 84 NA Cheers Petr > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of > npretnar > Sent: Tuesday, January 13, 2015 7:07 AM > To: r-help@r-project.org > Subject: [R] Complex merging problems > > Hello, > > I have two data frames structured as follows: > > df1 > > v1v2 > 1 83 > 1 83 > 1 84 > 1 84 > 1 85 > 1 85 > 2 90 > 2 91 > 2 91 > 2 91 > 2 92 > 4 89 > 4 89 > 4 90 > 4 90 > > df2 > > v1v2 > 1 83 > 2 83 > 3 83 > 4 83 > 1 84 > 2 84 > 3 84 > 4 84 > > ... etc. > > I am trying to create an indicator variable in df2 to indicate whether > the record is identified in df1. I just want to know if it appears > once. The problem seems to be that df1 contains multiple records with > the same data. I am attempting the following: > > df2$v3 <- ifelse(df2$v1 %in% df1$v1 & df2$v2==df2$v1, 1, 0). > > However, I get the following warning message: > > Warning message: > In df2$v2 == df1$v1 : > longer object length is not a multiple of shorter object length > > Nonetheless, the function outputs all 0's to df2$v3. If anybody has any > suggestions with this, I would greatly appreciate it. > > Thanks, > > - Nick Pretnar > npret...@gmail.com > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end
Re: [R] number of individuals where X=0 during all periods (longitudinal data)
Spend a little time with aggregate() ?aggregate - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of najuzz Sent: Monday, December 22, 2014 7:45 AM To: r-help@r-project.org Subject: [R] number of individuals where X=0 during all periods (longitudinal data) #Hi guys, #I would like to count the number of individuals that receive X=0 troughout their observational period. #example dataset: ID<-c(1,1,1,1,2,2,3,3,3) X<-c(0,1,2,1,0,0,0,0,0) Time<-c(1,2,3,4,1,2,1,2,3) Test<-data.frame(ID,X,Time) # Individuals 2 and 3 have x=0 during all their periods. The count should hence equal to two. I simply have # no clue how R could solve this for me. As an addon, I would also like to know the number of individuals #that report X=0 during all periods plus have at least 3 weeks of observations. The answer would be one in #this sample datset. #Thank you -- View this message in context: http://r.789695.n4.nabble.com/number-of-individuals-where-X-0-during-all-periods-longitudinal-data-tp4701023.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combinations between two vectors
Depending on what you want, you probably want to start with expand.grid(): # All combinations of test with test > pairs1 <- expand.grid(test, test) > nrow(pairs1) [1] 36 # Exclude cases that differ only in the order of the values # E.g. (1, 5001), but not (5001, 1), also (1, 1), etc are included > pairs2 <- pairs1[pairs1[,1] <= pairs1[,2],] > nrow(pairs2) [1] 21 # Same as pairs2 but (1, 1), etc are not included > pairs3 <- pairs1[pairs1[,1] < pairs1[,2],] > nrow(pairs3) [1] 15 --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Sarah Goslee Sent: Thursday, December 18, 2014 9:06 AM To: Alaios Cc: R-help@r-project.org Subject: Re: [R] combinations between two vectors I can't quite tell what you want: your example output is either unclear to me or mangled by posting in HTML (please don't). Is expand.grid(test, test) what you want, or partway to what you want? Sarah On Thu, Dec 18, 2014 at 9:56 AM, Alaios via R-help wrote: > Hi all,I am looking for a function that would give me all the combinations > between two vectors.Lets take as example the > > test<-seq(1,3,by=5000) > Browse[2]> test > [1] 1 5001 10001 15001 20001 25001 > I want all the combinations between two times the test... I think this is > called permutation so a function that could do permutation(test,test)and > produce the following > 1,11,50011,100011,15001 > 3,13,5001...25001,20001,25001,25001 > is there such a function ? > RegardsAlex > > > [[alternative HTML version deleted]] > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract values from multiple lists
Something like scens <- paste0("scen", 1:N) new.df <- data.frame(sapply(scens, function(x) get(x)[["pop.inf.r"]])) --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of SH Sent: Tuesday, December 16, 2014 11:06 AM To: r-help Subject: [R] Extract values from multiple lists Dear List, I hope this posting is not redundant. I have several list outputs with the same components. I ran a function with three different scenarios below (e.g., scen1, scen2, and scen3,...,scenN). I would like to extract the same components and group them as a data frame. For example, pop.inf.r1 <- scen1[['pop.inf.r']] pop.inf.r2 <- scen2[['pop.inf.r']] pop.inf.r3 <- scen3[['pop.inf.r']] ... pop.inf.rN<-scenN[['pop.inf.r']] new.df <- data.frame(pop.inf.r1, pop.inf.r2, pop.inf.r3,...,pop.inf.rN) My final output would be 'new.df'. Could you help me how I can do that efficiently? Thanks in advance, Steve P.S.: Below are some examples of summary outputs. > summary(scen1) Length Class Mode aql1 -none- numeric rql1 -none- numeric alpha 1 -none- numeric beta 1 -none- numeric n.sim 1 -none- numeric N 1 -none- numeric n.sample 1 -none- numeric n.acc 1 -none- numeric lot.inf.r 1 -none- numeric pop.inf.n 2000 -none- list pop.inf.r 2000 -none- list pop.decision.t1 2000 -none- list pop.decision.t2 2000 -none- list sp.inf.n2000 -none- list sp.inf.r2000 -none- list sp.decision 2000 -none- list > summary(scen2) Length Class Mode aql1 -none- numeric rql1 -none- numeric alpha 1 -none- numeric beta 1 -none- numeric n.sim 1 -none- numeric N 1 -none- numeric n.sample 1 -none- numeric n.acc 1 -none- numeric lot.inf.r 1 -none- numeric pop.inf.n 2000 -none- list pop.inf.r 2000 -none- list pop.decision.t1 2000 -none- list pop.decision.t2 2000 -none- list sp.inf.n2000 -none- list sp.inf.r2000 -none- list sp.decision 2000 -none- list > summary(scen3) Length Class Mode aql1 -none- numeric rql1 -none- numeric alpha 1 -none- numeric beta 1 -none- numeric n.sim 1 -none- numeric N 1 -none- numeric n.sample 1 -none- numeric n.acc 1 -none- numeric lot.inf.r 1 -none- numeric pop.inf.n 2000 -none- list pop.inf.r 2000 -none- list pop.decision.t1 2000 -none- list pop.decision.t2 2000 -none- list sp.inf.n2000 -none- list sp.inf.r2000 -none- list sp.decision 2000 -none- list [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dotplot axes labelling
You are very close. The argument scales(list(y=list())) supports multiple arguments for the y axis so you need to tell lattice how to use testylabels: dotplot(testmatrix, scales=list(y=list(labels=testylabels), xlab=NULL)) - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of r...@openmailbox.org Sent: Monday, December 15, 2014 10:03 AM To: r-help@r-project.org Subject: [R] dotplot axes labelling Subscribers, What is my mistake with the following example: library(lattice) testmatrix<-matrix(c(1,2,3,4,3,6,12,24),nrow=4,ncol=2) testylabels<-c('w1','x1','y1','z1') dotplot(testmatrix, scales=list(y=list(testylabels)), xlab=NULL) #testylabels not shown, instead 'D' 'C' 'B' 'A' Thanks in advance. -- __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create matrices with constraint
Actually there are not so many matrices as you suggest. > comb <- combn(28, 4) > dim(comb) [1] 4 20475 > sum(comb[1,]==1) [1] 2925 > comb[, 1] [1] 1 2 3 4 There are 20,475 combinations, but you cannot choose any four to make a 4x7 matrix since each value can be used only once. The combn() function returns the combinations sorted, so we can get the number of combinations that contain 1 with sum(comb[1,]==1) and that is 2,925. The set of 4x7 matrices cannot use the same combination more than once, so 2,925 is the maximum possible number of matrices and there may be fewer. As a first approach to finding them, you could take the first combination comb[, 1] which is 1, 2, 3, 4. Now add a second combination that does not include 1:4 and then a third combination that does not include any in the first two combinations and finally a fourth that does not include any in the first three combinations. Actually this is easy since we will just take 1:4, 5:8, 9:12, 13:16, 17:20, 21:24, 24:18. > cols <- sapply(c(1, 5, 9, 13, 17, 21, 24), function(x) + head(which(comb[1,]==x), 1)) > cols [1] 1 9850 15631 18656 19981 20406 20471 > comb[,cols] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]159 13 17 21 24 [2,]26 10 14 18 22 25 [3,]37 11 15 19 23 26 [4,]48 12 16 20 24 27 But now it gets more complicated. While building the second matrix, we have to make sure that it does not use any combinations that have already been used. Combinations used on earlier matrices may be necessary to complete later matrices and that is why the number of sets may be less than 2,925. This sequential approach would guarantee to obtain matrices meeting the OP's criteria, but would not necessarily produce the maximum number of matrices possible. --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of John McKown Sent: Monday, December 15, 2014 9:23 AM To: Kathryn Lord Cc: r-help Subject: Re: [R] create matrices with constraint On Fri, Dec 12, 2014 at 11:00 AM, Kathryn Lord wrote: > Dear all, > > Suppose that I have natural numbers 1 through 28. > > Based on these numbers, choose 4 numbers 7 times without replacement and > make a 4 by 7 matrix, for example, > > After a relaxing weekend, it came to me that these 4x7 matrices are really just a subset of all the possible permutations of the vector 1:28, recast as 4x7 matrices. Of course, there are factorial(28) (about 3*10^29 ) such 4x7 matrices. But given your constraints, I think that these can be subsetted to only those permutations in which the values in each row are sorted in ascending (or descending) order. I am fairly certain that this subset would be exhaustive for your purposes. I not really certain how big that subset would be. I think it would be 1/168th ( 1 out of 7*factorial(4) ) of the 3*10^29 permutations, or about 1.8*10^27. Which is still way to big to actually instantiate all at once. You might be able store such a thing in a huge data base. If you're lucky, you have access to a massive supercomputer so that you can get the results before the heat death of this universe. (exaggeration?) Two R libraries seems to address this. One is combinat. The other is permute. The permute library seems, to me, to be the more likely candidate. It contains a "how()" function which __appears to me__ to perhaps be a way to subset the permutations as they are being generated. But all that I get from reading the documentation is a bad headache. I never studied combinatorics. And I got a milder headache trying to read the Wikipedia article on it. I am curious about what you will do with such a set of matrices, once you have them. If you are permitted to say. -- While a transcendent vocabulary is laudable, one must be eternally careful so that the calculated objective of communication does not become ensconced in obscurity. In other words, eschew obfuscation. Maranatha! <>< John McKown [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if else for cumulative sum error
Let's try a different approach. You don't need a loop for this. First we need a reproducible example: > set.seed(42) > dadosmax <- data.frame(above=runif(150) + .5) Now compute your sums using cumsum() and diff() and then compute enchday using ifelse(). See the manual pages for each of these functions to understand how they work: > sums <- diff(c(0, cumsum(dadosmax$above)), 45) > dadosmax$enchday <- c(ifelse(sums >= 45, 1, 0), rep(NA, 44)) > dadosmax$enchday [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [26] 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [51] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [76] 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [101] 1 1 1 1 1 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [126] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA See the NA's? Those are what David Winsemius is talking about. For the 106th value, 106+44 is 150, but for the 107th value 107+144 is 151 which does not exist. Fortunately diff() understands that and stops at 106, but we have to add 44 NA's because that is the number of rows in your data frame. You might find this plot informative as well: > plot(sums, typ="l") > abline(h=45) Another way to get there is to use sapply() which will add the NA's for us: > sums <- sapply(1:150, function(x) sum(dadosmax$above[x:(x+44)])) > dadosmax$enchday <- ifelse(sums >= 45, 1, 0) But it won't be as fast if you have a large data set. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Tuesday, December 2, 2014 2:50 PM To: Jefferson Ferreira-Ferreira Cc: r-help@r-project.org Subject: Re: [R] if else for cumulative sum error On Dec 2, 2014, at 12:26 PM, Jefferson Ferreira-Ferreira wrote: > Thank you for replies. > > David, > > I tried your modified form > > for (i in 1:seq_along(rownames(dadosmax))){ No. it is either 1: or seq_along(...). in this case perhaps 1:(nrow(dadosmax)-44 would be safer You do not seem to have understood that you cannot use an index of i+44 when i is going to be the entire set of rows of the dataframe. There is "no there there" to quote Gertrude Stein's slur against Oakland. In fact there is not there there at i+1 when you get to the end. You either need to only go to row > dadosmax$enchday[i] <- if ( (sum(dadosmax$above[i:(i+44)])) >= 45) 1 else > 0 > } > > However, I'm receiving this warning: > Warning message: > In 1:seq_along(rownames(dadosmax)) : > numerical expression has 2720 elements: only the first used > > I can't figure out why only the first row was calculated... You should of course read these, but the error is not from your if-statement but rahter you for-loop-indexing. ?'if' ?ifelse > Any ideas? > > > > Em Tue Dec 02 2014 at 15:22:25, John McKown > escreveu: > >> On Tue, Dec 2, 2014 at 12:08 PM, Jefferson Ferreira-Ferreira < >> jeco...@gmail.com> wrote: >> >>> Hello everybody; >>> >>> I'm writing a code where part of it is as follows: >>> >>> for (i in nrow(dadosmax)){ >>> dadosmax$enchday[i] <- if (sum(dadosmax$above[i:(i+44)]) >= 45) 1 else 0 >>> } >>> >> >> Without some test data for any validation, I would try the following >> formula >> >> dadosmax$enchday[i] <- if >> (sum(dadosmax$above[i:(min(i+44,nrow(dadosmax)))] >= 45) 1 else 0 >> >> >> >>> >>> That is for each row of my data frame, sum an specific column (0 or 1) of >>> that row plus 44 rows. If It is >=45 than enchday is 1 else 0. >>> >>> The following error is returned: >>> >>> Error in if (sum(dadosmax$above[i:(i + 44)]) >= 45) 1 else 0 : >>> missing value where TRUE/FALSE needed >>> >>> I've tested the ifelse statement assigning different values to i and it >>> works. So I'm wondering if this error is due the fact that at the final of >>> my data frame there aren't 45 rows to sum anymore. I tried to use "try" >>> but >>> It's simply hide the error. >>> >>> How can I deal with this? Any ideas? >>> Thank you very much. >>> >>>[[alternative HTML version deleted]] >>> >>> __ >>> R-help@r
Re: [R] Creating submatrices from a dataframe, depending on factors in sample names
I may have misunderstood, but does this do what you want? > df.mat <- as.matrix(df) > same <- lapply(1:3, function(x) df.mat[grep(paste0("_", x), + rownames(df.mat)), grep(paste0("_", x), colnames(df.mat))]) > same [[1]] HQ673618_1 HQ674317_1 EU686630_1 HQ673618_1 NA 90.8 89.8 HQ674317_1 90.8 NA 98.6 EU686630_1 89.8 98.6 NA [[2]] EU686593_2 JN166322_2 EU491340_2 EU686593_2 NA 98.1 96.8 JN166322_2 98.1 NA 97.5 EU491340_2 96.8 97.5 NA [[3]] AB694259_3 AB694258_3 AB694462_3 AB694259_3 NA 98.3 95.9 AB694258_3 98.3 NA 95.8 AB694462_3 95.9 95.8 NA > Diff <- as.matrix(expand.grid(1:3, 1:3)) > Diff <- Diff[Diff[,1] different <- lapply(seq_len(nrow(Diff)), function(x) + df.mat[grep(paste0("_", Diff[x,1]), rownames(df.mat)), + grep(paste0("_", Diff[x,2]), colnames(df.mat))]) > different [[1]] EU686593_2 JN166322_2 EU491340_2 HQ673618_1 89.6 89.8 88.9 HQ674317_1 97.7 98.4 97.4 EU686630_1 98.4 98.9 97.7 [[2]] AB694259_3 AB694258_3 AB694462_3 HQ673618_1 87.8 88.2 88.3 HQ674317_1 94.9 96.2 95.1 EU686630_1 95.4 96.4 95.8 [[3]] AB694259_3 AB694258_3 AB694462_3 EU686593_2 94.4 95.6 94.8 JN166322_2 95.3 96.5 95.9 EU491340_2 96.5 97.7 96.0 - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter Sent: Monday, December 1, 2014 11:46 AM To: Tim Richter-Heitmann Cc: r-help@r-project.org Subject: Re: [R] Creating submatrices from a dataframe, depending on factors in sample names I do not have the patience to study your request carefully, but does the following help? > a <- 1:3 > x <- outer(a,a,paste,sep=".") > x [,1] [,2] [,3] [1,] "1.1" "1.2" "1.3" [2,] "2.1" "2.2" "2.3" [3,] "3.1" "3.2" "3.3" > x[upper.tri(x)] [1] "1.2" "1.3" "2.3" > x[upper.tri(x,diag=TRUE)] [1] "1.1" "1.2" "2.2" "1.3" "2.3" "3.3" This gives you a vector all possible pairs (including identical pairs or not) of values of a, which you could then loop over as an index to do what you want, I think. If this is not what you want, just ignore without replying. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Mon, Dec 1, 2014 at 8:47 AM, Tim Richter-Heitmann wrote: > Hello there, > > this is a cross-post of a stack-overflow question, which wasnt answered, but > is very important for my work. Apologies for breaking any rules, but i do > hope for some help from the list instead: > > I have a huge matrix of pairwise similarity percentages between different > samples. The samples are belonging to groups. The groups are determined by > the suffix "_n" in the row.names/header names. > In the first step, i wanted to create submatrices consisting of all pairs > within single groups (i.e. for all samples from "_1"). > However, I realized that i need to know all pairwise submatrices, between > all combination of groups. So, i want to create (a list of) vectors that are > named "_n1 vs _n2" (or similar) for all combinations of n, as illustrated by > the colored rectangulars: > > http://i.stack.imgur.com/XMkxj.png > > Reproducible code, as provided by helpful Stack Overflow members, dealing > with identical "_n"s. > > > df <- structure(list(HQ673618_1 = c(NA, 90.8, 89.8, 89.6, 89.8, > 88.9, > 87.8, 88.2, 88.3), HQ674317_1 = c(90.8, NA, 98.6, 97.7, 98.4, > 97.4, 94.9, 96.2, 95.1), EU686630_1 = c(89.8, 98.6, NA, 98.4, > 98.9, 97.7, 95.4, 96.4, 95.8), EU686593_2 = c(89.6, 97.7, 98.4, > NA, 98.1, 96.8, 94.4, 95.6, 94.8), JN166322_2 = c(89.8, 98.4, > 98.9, 98.1, NA, 97.5, 95.3, 96.5, 95.9), EU491340_2 = c(88.9, > 97.4, 97.7, 96.8, 97.5, NA, 96.5, 97.7, 96), AB694259_3 = c(87.8, > 94.9, 95.4, 94.4, 95.3, 96.5, NA, 98.3, 95.9), AB694258_3 = c(88.2, > 96.2, 96.4, 95.6, 96.5, 97.7, 98.3, NA, 95.8), AB694462_3 = c(88.3, > 95.1, 95.8, 94.8, 95.9, 96, 95.9, 95.8, NA)), .Names = > c("HQ673618_1", > &qu
Re: [R] Converting list to character
Or just modify your aggregate() command: > TAB <- aggregate(mydata$CODE, by=list(ID=mydata$ID, +YEAR=mydata$YEAR), FUN=paste0, collapse=", ") > TAB ID YEAR x 1 986 2008 GR.3.8 2 1251 2008 GR.3.1, GR.3.8 3 1801 2008 GR.3.8 411 2009 GR.3.7 5 986 2009 GR.3.8 6 1251 2009 GR.3.1, GR.3.8 7 1801 2009 GR.3.8 811 2010 GR.3.7 9 460 2010 GR.3.1 10 986 2010 GR.3.8 11 1251 2010 GR.3.1, GR.3.8 12 1801 2010 GR.3.8 13 460 2011 GR.3.1 14 986 2011 GR.3.8 15 1251 2011 GR.3.1, GR.3.8 16 1801 2011 GR.3.8 ------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Lee, Chel Hee Sent: Tuesday, November 25, 2014 11:23 AM To: Massimiliano Tripoli; r-help@r-project.org Subject: Re: [R] Converting list to character > do.call("rbind", TAB$x) [,1] [,2] 1 "GR.3.8" "GR.3.8" 2 "GR.3.1" "GR.3.8" 4 "GR.3.8" "GR.3.8" 5 "GR.3.7" "GR.3.7" 6 "GR.3.8" "GR.3.8" 7 "GR.3.1" "GR.3.8" 9 "GR.3.8" "GR.3.8" 10 "GR.3.7" "GR.3.7" 11 "GR.3.1" "GR.3.1" 12 "GR.3.8" "GR.3.8" 13 "GR.3.1" "GR.3.8" 15 "GR.3.8" "GR.3.8" 16 "GR.3.1" "GR.3.1" 17 "GR.3.8" "GR.3.8" 18 "GR.3.1" "GR.3.8" 20 "GR.3.8" "GR.3.8" > Is this what you are looking for? I hope this helps. Chel Hee Lee On 11/25/2014 6:07 AM, Massimiliano Tripoli wrote: > > > Dear all, > > I can't convert the result of aggregate function in a dataframe. My data > looks like: > > mydata <- structure(list(ID = c(11, 11, 460, 460, 986, 986, 986, 986, 1251, > 1251, 1251, 1251, 1251, 1251, 1251, 1251, 1801, 1801, 1801, 1801 > ), YEAR = c(2009, 2010, 2010, 2011, 2008, 2009, 2010, 2011, 2008, > 2008, 2009, 2009, 2010, 2010, 2011, 2011, 2008, 2009, 2010, 2011 > ), Y = c(158126, 153015, 3701, 5880, 718663, 661112, 527233, > 558281, 450, 131714, 427, 124648, 425, 116500, 434, 123853, 17400, > 16493, 8057, 8329), CODE = c("GR.3.7", "GR.3.7", "GR.3.1", "GR.3.1", > "GR.3.8", "GR.3.8", "GR.3.8", "GR.3.8", "GR.3.1", "GR.3.8", "GR.3.1", > "GR.3.8", "GR.3.1", "GR.3.8", "GR.3.1", "GR.3.8", "GR.3.8", "GR.3.8", > "GR.3.8", "GR.3.8")), .Names = c("ID", "YEAR", "Y", "CODE"), row.names = c(NA, > 20L), class = "data.frame") > > and by using aggregate function > > TAB <- > aggregate(mydata$CODE,by=list(ID=mydata$ID,YEAR=mydata$YEAR),FUN=paste0) > > What I want is a dataframe like of printing TAB: >> TAB > ID YEAR x > 1 986 2008 GR.3.8 > 2 1251 2008 GR.3.1, GR.3.8 > 3 1801 2008 GR.3.8 > 411 2009 GR.3.7 > 5 986 2009 GR.3.8 > 6 1251 2009 GR.3.1, GR.3.8 > 7 1801 2009 GR.3.8 > 811 2010 GR.3.7 > 9 460 2010 GR.3.1 > 10 986 2010 GR.3.8 > 11 1251 2010 GR.3.1, GR.3.8 > 12 1801 2010 GR.3.8 > 13 460 2011 GR.3.1 > 14 986 2011 GR.3.8 > 15 1251 2011 GR.3.1, GR.3.8 > 16 1801 2011 GR.3.8 > >> str(TAB)[1:10] > 'data.frame':16 obs. of 3 variables: > $ ID : num 986 1251 1801 11 986 ... > $ YEAR: num 2008 2008 2008 2009 2009 ... > $ x :List of 16 >..$ 1 : chr "GR.3.8" >..$ 2 : chr "GR.3.1" "GR.3.8" >..$ 4 : chr "GR.3.8" >..$ 5 : chr "GR.3.7" >..$ 6 : chr "GR.3.8" >..$ 7 : chr "GR.3.1" "GR.3.8" >..$ 9 : chr "GR.3.8" >..$ 10: chr "GR.3.7" >..$ 11: chr "GR.3.1" >..$ 12: chr "GR.3.8" >..$ 13: chr "GR.3.1" "GR.3.8" >..$ 15: chr "GR.3.8" >..$ 16: chr "GR.3.1" >..$ 17: chr "GR.3.8" >..$ 18: chr "GR.3.1" "GR.3.8" >..$ 20: chr "GR.3.8" > NULL > > As you can see the "x" coloumn is a list and I would want to change it to > character variable. > Anyone may help me? > Thanks, > > Massimiliano > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rose Diagrams for Geology
No. Just use the circular() function to specify that your data are in degrees and clockwise and the graph will be labeled that way. David C (I was beginning to think that this thread was only for Davids). -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of jwd Sent: Friday, November 21, 2014 1:58 AM To: r-help@r-project.org Subject: Re: [R] Rose Diagrams for Geology On Tue, 18 Nov 2014 22:06:03 -0600 David Doyle wrote: > Thank you to David and David for their help. The code below > generated what I needed. > > > library(circular) > mydata <- read.table("http://doylesdartden.com/R/Joints.csv";, > header=TRUE, sep=",",) > x <- circular(mydata$JointsRad) > rose.diag(x, > > #Set point character to use > pch = 20, > #sets font size > cex = 1, > #parameter that controls the size of the circle. > #1= default <1 makes it larger > makes it smaller > shrink = 1, > #the color for filling the rose diagram. > col=2, > prop = 2, > # number of bins. 36 = 10 degrees each. 18 = 20 degree > each bins=36, > # Ticks showing bins > ticks=TRUE, > # Unites. > units="degrees", > # list main title > main="Rose Diagram of XXX") > # for more info see > http://www.inside-r.org/packages/cran/circular/docs/rose.diag > I've been following this thread with some interest. One problem that I might have with the code above is that as it is, the plot is labeled with 0-deg to the left, and numbered counter clockwise (standard trigonometric format). Most field mapping data I have collected has been either in quadrant form (rarely) or more commonly in azimuthal form (0-360 degrees order clockwise from the top). Is that an issue? jwdougherty __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rose Diagrams for Geology
Look at circular more carefully. It accepts both degrees and radians, but you have to create a circular object with circular() to specify what kind of circular data you have. Then you can plot and get circular statistics on your data. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Doyle Sent: Tuesday, November 18, 2014 3:42 PM To: r-help@r-project.org Subject: [R] Rose Diagrams for Geology Hello everyone, In geology we often do rose diagrams showing the number of features along a certain compass direction within a given range (bin) of angle (0-180 degrees). I was wondering if anybody has had experience with this in R and if they could recommend a package. I looked at the circular package but it seems to deal only in radian and we normally use degrees. I've also looked a little at openair being rose diagrams are often used for wind directions. Any suggestions / guidance would be greatly appreciated. Thank you for your time. David Doyle [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help for x axis
This should get you started: > aggData <- aggregate(age~smoke+gender, sampleData, mean) > aggData smoke gender age 1 0 0 39.47231 2 1 0 40.09020 3 0 1 39.59814 4 1 1 42.04092 > plotInfo <- barplot(aggData$age) > axis(1, c(0, plotInfo), c("Gender", "-", "-", "+", "+"), line=.75, + lwd=0, cex.axis=1.25, xpd=TRUE) > axis(1, c(0, plotInfo), c("Smoke", "-", "+", "-", "+"), line=2, + lwd=0, cex.axis=1.25, xpd=TRUE) To adapt it you will need to read the manual pages for barplot() and axis() and the page on graphical parameters par(). In particular, you will have to allocate more space at the bottom of the plot if you want to add more lines. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Olivier Sent: Monday, November 17, 2014 9:24 AM To: r-help@r-project.org Subject: [R] Help for x axis Hi, I want to customize x axis to scientific data. I do experiments with different triggers. As others publications, I want that there is one line for each trigger with the sign "-" or "+" to show if the trigger is used or no. You will find attached an exemple. Please find below a data.frame you could use to explain me. Thank you for your response, Olivier set.seed(3) sampleData <- data.frame(id = 1:100,gender = sample(c("0", "1"), 100, replace = TRUE), smoke = sample (c("0","1"), 100, replace=TRUE), age = rnorm(100, 40, 10)) summary(sampleData) -> I want to give results with histograms or box.plot (age according to sex and smoking status) -> x axis may be like something like this : Gender- - ++ Smoke - +- + __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help for x axis
Try these: aggData <- aggregate(age~smoke+gender, sampleData, function(x) c(mean=mean(x), stderr=sd(x)/sqrt(length(x aggData plotInfo <- barplot(aggData$age[,1], ylim=c(0, max(rowSums(aggData$age axis(1, c(0, plotInfo), c("Gender", "-", "-", "+", "+"), line=.75, lwd=0, cex.axis=1.25, xpd=TRUE) axis(1, c(0, plotInfo), c("Smoke", "-", "+", "-", "+"), line=2, lwd=0, cex.axis=1.25, xpd=TRUE) top <- aggData$age[,1]+aggData$age[,2] bottom <- aggData$age[,1]-aggData$age[,2] arrows(plotInfo, bottom, plotInfo, top, length=.15, angle=90, code=3) boxplot(age~smoke+gender, sampleData, xaxt="n") axis(1, 0:4, c("Gender", "-", "-", "+", "+"), line=.75, lwd=0, cex.axis=1.25, xpd=TRUE) axis(1, 0:4, c("Smoke", "-", "+", "-", "+"), line=2, lwd=0, cex.axis=1.25, xpd=TRUE) David -Original Message- From: Olivier [mailto:olivier.lerou...@ymail.com] Sent: Monday, November 17, 2014 4:39 PM To: David L Carlson Subject: Re: [R] Help for x axis Thank you very much, it is all I want to do. Is it possible with showing the error-bars or in a boxplot? Best regards, Olivier Le Rouzic On 2014-11-17, 4:07 PM, David L Carlson wrote: > This should get you started: > >> aggData <- aggregate(age~smoke+gender, sampleData, mean) >> aggData >smoke gender age > 1 0 0 39.47231 > 2 1 0 40.09020 > 3 0 1 39.59814 > 4 1 1 42.04092 >> plotInfo <- barplot(aggData$age) >> axis(1, c(0, plotInfo), c("Gender", "-", "-", "+", "+"), line=.75, > + lwd=0, cex.axis=1.25, xpd=TRUE) >> axis(1, c(0, plotInfo), c("Smoke", "-", "+", "-", "+"), line=2, > + lwd=0, cex.axis=1.25, xpd=TRUE) > > To adapt it you will need to read the manual pages for barplot() and axis() > and the page on graphical parameters par(). In particular, you will have to > allocate more space at the bottom of the plot if you want to add more lines. > > - > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Olivier > Sent: Monday, November 17, 2014 9:24 AM > To: r-help@r-project.org > Subject: [R] Help for x axis > > Hi, > I want to customize x axis to scientific data. I do experiments with > different triggers. As others publications, I want that there is one > line for each trigger with the sign "-" or "+" to show if the trigger is > used or no. You will find attached an exemple. > Please find below a data.frame you could use to explain me. > Thank you for your response, > Olivier > > > > set.seed(3) > sampleData <- data.frame(id = 1:100,gender = sample(c("0", "1"), 100, > replace = TRUE), smoke = sample (c("0","1"), 100, replace=TRUE), age = > rnorm(100, 40, 10)) > summary(sampleData) > > -> I want to give results with histograms or box.plot (age according to > sex and smoking status) > -> x axis may be like something like this : > > Gender- - ++ > Smoke - +- + > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with ddply/summarize
I think this is what you want: > MyVar <- 1:10 > MyVar [1] 1 2 3 4 5 6 7 8 9 10 > mean(MyVar) [1] 5.5 > txt <- "MyVar" > mean(txt) [1] NA Warning message: In mean.default(txt) : argument is not numeric or logical: returning NA > mean(get(txt)) [1] 5.5 --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of John Posner Sent: Thursday, November 13, 2014 5:32 PM To: 'r-help@r-project.org' Subject: [R] Help with ddply/summarize I have a straightforward application of ddply() and summarize(): ddply(MyFrame, .(Treatment, Week), summarize, MeanValue=mean(MyVar)) This works just fine: Treatment Week MeanValue 1MyDrug BASELINE 5.91 2MyDrugWEEK 1 4.68 3MyDrugWEEK 2 4.08 4MyDrugWEEK 3 3.67 5MyDrugWEEK 4 2.96 6MyDrugWEEK 5 2.57 7MyDrugWEEK 6 2.50 8Placebo BASELINE 8.58 9Placebo WEEK 1 8.25 ... But I want to specify the variable (MyVar) as a character string: ddply(MyFrame, .(Treatment, Week), summarize, MeanValue=mean("MyVar")) (Actually, the character string "MyVar" will be selected from a vector of character strings.) The code above produces no joy: Treatment Week MeanValue 1MyDrug BASELINENA 2MyDrugWEEK 1NA 3MyDrugWEEK 2NA 4MyDrugWEEK 3NA ... I tried a few things, including: as.name("MyVar") as.quoted("MyVar") ... but they all produced the name results: NAs I'm obviously thrashing around in the dark! Any advice would be greatly appreciated. -John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor levels > numeric values
Also look at the Frequently Asked Questions document that comes with your R installation: 7.10 How do I convert factors to numeric? It may happen that when reading numeric data into R (usually, when reading in a file), they come in as factors. If f is such a factor object, you can use as.numeric(as.character(f)) to get the numbers back. More efficient, but harder to remember, is as.numeric(levels(f))[as.integer(f)] In any case, do not call as.numeric() or their likes directly for the task at hand (as as.numeric() or unclass() give the internal codes). - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gerrit Eichner Sent: Wednesday, November 12, 2014 8:06 AM To: David Studer Cc: r-help@r-project.org Subject: Re: [R] factor levels > numeric values Hello, David, take a look at the beginning of the "Warning" section of ?factor. Hth -- Gerrit > Hi everybody, > > I have another question (to which I could not find an answer in my r-books. > I am sure, it's not a great issue, but I simply lack of a good idea how to > solve this: > > One of my variables gets imported as a factor instead of a numeric variable. > Now I have a... > Factor w/ 63 levels "0","0.02","0.03",..: 1 NA NA 1 NA NA 1 1 53 10 ... > > How can I transform these factor levels into actual values? > > Thank you very much for any help! > David > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting within groups / means by groups
In addition to Jeff's recommendation, you need to read a basic introduction to R. Your data frame is probably not what you think it is: > group<-c("A", "A", "A", "B", "B", "B", "B", "C") > value<-c(1,3,2,2,2,4,4,1) > df<-as.data.frame(cbind(group, value)) > str(df) 'data.frame': 8 obs. of 2 variables: $ group: Factor w/ 3 levels "A","B","C": 1 1 1 2 2 2 2 3 $ value: Factor w/ 4 levels "1","2","3","4": 1 3 2 2 2 4 4 1 By using cbind() you combined a character vector and a numeric vector into a matrix so R converted the numeric value to characters since a matrix can hold only a single data type. The cbind() function is generic and which version you get depends on the first argument. > cbind(group, value) group value [1,] "A" "1" [2,] "A" "3" [3,] "A" "2" [4,] "B" "2" [5,] "B" "2" [6,] "B" "4" [7,] "B" "4" [8,] "C" "1" Then you used as.data.frame() to convert the character matrix to a data.frame. The default for character variables is to convert those to factors. All you need is > dfa <- data.frame(group, value) > str(dfa) 'data.frame': 8 obs. of 2 variables: $ group: Factor w/ 3 levels "A","B","C": 1 1 1 2 2 2 2 3 $ value: num 1 3 2 2 2 4 4 1 I changed df to dfa since df() is the density function for the f distribution. R is not likely to get confused, but you might. Then read the manual page on ave() to see why these work and how to adapt them: > ave(dfa$value, dfa$group, FUN=length) [1] 3 3 3 4 4 4 4 1 > ave(dfa$value, dfa$group) [1] 2 2 2 3 3 3 3 1 - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jeff Newmiller Sent: Monday, November 10, 2014 9:19 AM To: stude...@gmail.com; r-help@r-project.org Subject: Re: [R] Counting within groups / means by groups Help file ?ave should apply here. Please read the Posting Guide mentioned in the footer of every email on this list and on the list manager page for this mailing list. It warns you to read the archives before posting and to post in plain text format rather than HTML format. --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On November 10, 2014 6:39:47 AM PST, David Studer wrote: >Hi everyone! > >I have problems finding a solution to the following two problems: > >My sample-dataframe consists of two variables "group" and "value": > >group<-c("A", "A", "A", "B", "B", "B", "B", "C") >value<-c(1,3,2,2,2,4,4,1) >df<-as.data.frame(cbind(group, value)) > >Problem 1: >** > >Now I'd like to count the number of group-A-cases, group-B-cases etc >and >write >this number into a new column. It should be like: > >count_group<-c(3, 3, 3, 4, 4, 4, 4, 1) > >Problem 2: >*** > >I'd like to add new column with the mean values (or any other function) >within >my groups. E.g: > >Group A: (1+3+2)/3=2 >Group B: (2+2+4+4)/4=3 >Group C: =1 > >Now I'd add another column 2 2 3 3 3 3 1 > > >Can anyone help me, how this can be done best? > >Thank you! >David > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] limit of cmdscale function
You avoid the call to cmdscale() by supplying your own starting configuration (see the manual page for the y= argument). You could still hit other barriers within isoMDS() or insufficient memory on your computer. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Kawashima, Masayuki Sent: Wednesday, November 5, 2014 10:51 PM To: r-help@r-project.org Subject: [R] limit of cmdscale function Hi We have a few questions regarding the use of the "isoMDS" function. When we run "isoMDS" function using 60,000 x 60,000 data matrix, we get the following error message: cmdscale(d, k) : invalid value of 'n' Calls: isoMDS -> cmdscale We checked the source code of "cmdscale" and found the following limitation: ## we need to handle nxn internally in dblcen if(is.na(n) || n > 46340) stop("invalid value of 'n'") 1. This cmdscale limitation ('n > 46340') is due to the limitation of BLAS and LAPACK variables(int4) which can only handle '2^31-1' amount of data? 2. Is there any workaround to run isoMDS using large data (i.e. greater than 46340)? We would like to run isoMDS using a maximum of 150,000x150,000 data matrix. Best regards Masayuki Kawashima Email: kawasim...@jp.fujitsu.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading data from a web
You did not read the data with the commands you provided since c1 is not defined so read.fwf() fails immediately. Here is a solution that works for the link you provided, but would need to be modified for months that do not have 30 days: > lnk <- > "http://www.data.jma.go.jp/gmd/env/data/radiation/data/geppo/201004/DR201004_sap.txt"; > raw <- readLines(lnk) # Read the file as text lines > raw <- raw[19:48] # Pull out the data > raw <- substr(raw, 16, nchar(raw)) # Strip the leading blanks > raw <- gsub(" +", ",", raw)# Replace two or more blanks with a comma > raw <- gsub("\\.\\.\\.", "NA", raw) # Replace ... with NA > Solar <- read.csv(text=raw, header=FALSE, colClasses=c("character", + rep("numeric", 25))) > str(Solar) 'data.frame': 30 obs. of 26 variables: $ V1 : chr "4 1" "4 2" "4 3" "4 4" ... $ V2 : num NA NA NA NA NA NA NA NA NA NA ... $ V3 : num NA NA NA NA NA NA NA NA NA NA ... $ V4 : num NA NA NA NA NA NA NA NA NA NA ... $ V5 : num NA NA NA NA NA NA NA NA NA NA ... $ V6 : num NA NA NA NA NA NA NA NA NA NA ... $ V7 : num 0 0 0 2 0 8 0 75 2 0 ... $ V8 : num 0 0 17 133 0 27 36 218 1 1 ... $ V9 : num 0 98 29 205 0 23 4 280 1 0 ... $ V10: num 2 190 62 100 0 9 0 310 7 12 ... $ V11: num 0 237 49 227 86 9 0 321 0 0 ... $ V12: num 0 303 21 151 177 13 1 304 52 0 ... $ V13: num 0 286 72 199 131 8 2 320 33 6 ... $ V14: num 0 318 203 284 30 1 102 285 9 130 ... $ V15: num 0 314 241 282 10 0 43 286 93 107 ... $ V16: num 1 270 171 256 6 1 0 272 181 27 ... $ V17: num 3 190 100 214 34 0 11 255 177 0 ... $ V18: num 0 89 69 129 24 0 8 205 138 0 ... $ V19: num 0 7 2 27 2 0 0 80 30 0 ... $ V20: num 0 0 0 0 0 0 0 0 0 0 ... $ V21: num NA NA NA NA NA NA NA NA NA NA ... $ V22: num NA NA NA NA NA NA NA NA NA NA ... $ V23: num NA NA NA NA NA NA NA NA NA NA ... $ V24: num NA NA NA NA NA NA NA NA NA NA ... $ V25: num NA NA NA NA NA NA NA NA NA NA ... $ V26: num 6 2302 1036 2209 500 ... - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Alemu Tadesse Sent: Wednesday, October 29, 2014 2:21 PM To: r-help@r-project.org Subject: [R] reading data from a web Dear All, I have data of the format shown in the link http://www.data.jma.go.jp/gmd/env/data/radiation/data/geppo/201004/DR201004_sap.txt that I need to read. I have downloaded all the data from the link and I have it on my computer. I used the following script (got it from web) and was able to read the data. But, it is not in the format that I wanted it to be. I want it a data frame and clean numbers. asNumeric <- function(x) as.numeric(as.character(x)) factorsNumeric <- function(data) modifyList(data, lapply(data[, sapply(data, is.logical)],asNumeric)) data=read.fwf(filename, widths=c(c1),skip=18, header=FALSE) data$V2<-as.numeric(gsub(" ","", as.character(data$V2) , fixed=TRUE)) f <- factorsNumeric(data) Any help is appreciated. Best, Alemu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "inahull? from package alphahull not working when used with lapply
Why not just > library (alphahull) > DT=data.frame(x=c(0.25,0.25,0.75,0.75),y=c(0.25,0.75,0.75,0.25)) > Hull <- ahull(DT, alpha = 0.5) > TEST<- data.frame(x=c(0.25,0.5),y=c(0.5,0.5)) > apply(TEST, 1, function(x) inahull(Hull, x)) [1] FALSE TRUE --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bart Kastermans Sent: Monday, October 27, 2014 2:10 PM To: r-help@r-project.org Subject: Re: [R] "inahull? from package alphahull not working when used with lapply On 27/10/14 19:42, Camilo Mora wrote: > Hi Bart, > > Even after putting the variables in the apply function, the results come not > right: > > library (alphahull) > DT=data.frame(x=c(0.25,0.25,0.75,0.75),y=c(0.25,0.75,0.75,0.25)) > Hull <- ahull(DT, alpha = 0.5) > > TEST<- data.frame(x=c(0.25,0.5),y=c(0.5,0.5)) > plot(Hull) > points(TEST) > > InHul2D <- function(Val1, Val2, Hull) inahull(Hull, p = c(Val1, Val2)) > > IN <- apply(TEST, 1, function(x,y) InHul2D("x","y",Hull)) > > Try with this version of your function: InHul2D <- function(Val1, Val2, Hull) { stopifnot(is.numeric(Val1), is.numeric(Val2)) inahull(Hull, p = c(Val1, Val2)) } And answer the question; why would you put quotes around x and y in InHul2D call in apply? Once you remove the quotes, and get the error "Error: argument "y" is missing, with no default" that I mentioned in my last email, look at my last email to find out why. I'll be happy to help you further with this, but then you have to explain the output you get from using my version of InHul2D (before you remove the quotes), and why my last email didn't solve the problem after you removed the quotes. Check ?stopifnot, and ?is.numeric Best, Bart __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting rows/columns of a matrix
Note that you do not have to create the vector of 1's (TRUE) and 0's (FALSE) if you know the index values: > j <- c(2, 4, 6) > a[j, j] [,1] [,2] [,3] [1,]8 20 32 [2,] 10 22 34 [3,] 12 24 36 ========== David L. Carlson Department of Anthropology Texas A&M University -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Steven Yen Sent: Sunday, October 26, 2014 1:57 PM To: Rui Barradas; r-help Subject: Re: [R] Selecting rows/columns of a matrix Rui Thanks. This works great. Below, I get the 2nd, 4th, and 6th rows/columns: > (a<-matrix(1:36,6,6)) [,1] [,2] [,3] [,4] [,5] [,6] [1,]17 13 19 25 31 [2,]28 14 20 26 32 [3,]39 15 21 27 33 [4,]4 10 16 22 28 34 [5,]5 11 17 23 29 35 [6,]6 12 18 24 30 36 > (j<-matrix(c(0,1,0,1,0,1))) [,1] [1,]0 [2,]1 [3,]0 [4,]1 [5,]0 [6,]1 > ((a[as.logical(j), as.logical(j)])) [,1] [,2] [,3] [1,]8 20 32 [2,] 10 22 34 [3,] 12 24 36 Steven Yen At 02:49 PM 10/26/2014, Rui Barradas wrote: >Sorry, that should be > >t(a[as.logical(j), as.logical(j)]) > >Rui Barradas > >Em 26-10-2014 18:45, Rui Barradas escreveu: >>Hello, >> >>Try the following. >> >>a[as.logical(j), as.logical(j)] >> >># or >>b <- a[as.logical(j), ] >>t(b)[as.logical(j), ] >> >> >>Hope this helps, >> >>Rui Barradas >> >>Em 26-10-2014 18:35, Steven Yen escreveu: >>>Dear >>> >>>I am interested in selecting rows and columns of a matrix with a >>>criterion defined by a binary indicator vector. Let matrix a be >>> >>> > a<-matrix(1:16, 4,4,byrow=T) >>> > a >>> [,1] [,2] [,3] [,4] >>>[1,]1234 >>>[2,]5678 >>>[3,]9 10 11 12 >>>[4,] 13 14 15 16 >>> >>>Elsewhere in Gauss, I select the first and third rows and columns of >>>a by defining a column vector j = [1,0,1,0]. Then, select the rows of >>>a using j, and then selecting the rows of the transpose of the >>>resulting matrix using j again. I get the 2 x 2 matrix as desired. Is >>>there a way to do this in R? below are my Gauss commands. Thank you. >>> >>>--- >>> >>>j >>> >>>1 >>>0 >>>1 >>>0 >>> >>>a=selif(a,j); a >>> >>>1 2 3 4 >>>9 10 11 12 >>> >>>a=selif(a',j); a >>> >>>1 9 >>>3 11 >>> >>>__ >>>R-help@r-project.org mailing list >>>https://stat.ethz.ch/mailman/listinfo/r-help >>>PLEASE do read the posting guide >>>http://www.R-project.org/posting-guide.html >>>and provide commented, minimal, self-contained, reproducible code. >> >>__ >>R-help@r-project.org mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to calculate a numeric's digits count?
Where do these numbers come from? If they are calculated values, they are actually many decimal places longer than your examples. They are represented on your terminal with fewer decimals according to the setting of options("digits"). For example: > sqrt(2)*sqrt(2) [1] 2 > sqrt(2)*sqrt(2) == 2 [1] FALSE # FAQ 7.31 Why doesn’t R think these numbers are equal? > options("digits") $digits [1] 7 > options(digits=22) > sqrt(2)*sqrt(2) [1] 2.000444089 If the numbers were read from a plain text file and you are talking about how they are represented in the file, analyze them as character strings. ------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of PO SU Sent: Thursday, October 23, 2014 10:35 PM To: R. Help Subject: [R] how to calculate a numeric's digits count? Dear usRers, Now i want to cal ,e.g. cal(1.234) will get 3 cal(1) will get 0 cal(1.3045) will get 4 But the difficult part is cal(1.3450) will get 4 not 3. So, is there anyone happen to know the solution to this problem, or it can't be solved in R, because 1.340 will always be transformed autolly to 1.34? -- PO SU mail: desolato...@163.com Majored in Statistics from SJTU __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assigning letter to a column
Minor correction, given your code, values less than 3 will be coded as "S" since they are less than 15.23. In the code I suggested, values less than 3 will be coded as missing (NA). David C -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson Sent: Friday, October 17, 2014 9:15 AM To: Monaly Mistry; r-help@r-project.org Subject: Re: [R] assigning letter to a column I think it is doing exactly what you have told it to do, but that is probably not what you want it to do. First, you do not need a loop since the ifelse() function is vectorized. Read the manual page and the examples carefully. Also you are coding ifelse() as if it were the same as if() {} else() {}. Again you need to refer to the documentation. Second, this seems like a job for cut() not ifelse(). Third, look at your code. The first statement is x$COR_LOC>=3 | x$COR_LOC<15.230 so everything greater than 3 will be coded as "S." That is probably all of your data. You probably want to use & (and) instead of | (or). It is not clear what you want to happen for values less than 3 but they will be NA (missing). Your entire ifelse() boils down to set.seed(42) x <- data.frame(COR_LOC=runif(100, 0, 30)) x$ForS <- cut(x$COR_LOC, breaks=c(3, 15.23, 19.81, 25.40, Inf), labels=c("S", "I1", "I2", "F"), right=FALSE) No loops, no ifelse's. Anything below 3 will - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Monaly Mistry Sent: Friday, October 17, 2014 8:27 AM To: r-help@r-project.org Subject: [R] assigning letter to a column Hi, I'm having trouble with assigning a letter to a column based on the value of another column. Since I have separate data files I've saved then into one folder and I'm reading them in separately into the function. The code is below. #F= fast; S= slow; I1= Intermediate score 1; I2=Intermediate score 2 filename<-list.files(pattern="*.txt") filename corloc<- function(x){ x<-read.table(filename[x], sep="\t", header=TRUE) #will extract the relevant data file from folder 1998. ex. corloc(1) will return 1998 breeding year data x[,"ForS"]<-0 #new column for (i in length(x$CORLOC)){ #this is the bit that I'm having a problem with since it's not assigning the appropriate letter into the "ForS" column ifelse(x$COR_LOC>=3 | x$COR_LOC<15.230, ForS<-"S", ifelse(x$COR_LOC>=15.230 | x$COR_LOC<19.810, ForS<-"I1", ifelse(x$COR_LOC>=19.810 | x$COR_LOC<25.540, FS<-"I2",ForS<-"F")))} print(x) } I've tried some of the solutions on stackoverflow but still was unsuccessful. Best, Monaly. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assigning letter to a column
I think it is doing exactly what you have told it to do, but that is probably not what you want it to do. First, you do not need a loop since the ifelse() function is vectorized. Read the manual page and the examples carefully. Also you are coding ifelse() as if it were the same as if() {} else() {}. Again you need to refer to the documentation. Second, this seems like a job for cut() not ifelse(). Third, look at your code. The first statement is x$COR_LOC>=3 | x$COR_LOC<15.230 so everything greater than 3 will be coded as "S." That is probably all of your data. You probably want to use & (and) instead of | (or). It is not clear what you want to happen for values less than 3 but they will be NA (missing). Your entire ifelse() boils down to set.seed(42) x <- data.frame(COR_LOC=runif(100, 0, 30)) x$ForS <- cut(x$COR_LOC, breaks=c(3, 15.23, 19.81, 25.40, Inf), labels=c("S", "I1", "I2", "F"), right=FALSE) No loops, no ifelse's. Anything below 3 will - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Monaly Mistry Sent: Friday, October 17, 2014 8:27 AM To: r-help@r-project.org Subject: [R] assigning letter to a column Hi, I'm having trouble with assigning a letter to a column based on the value of another column. Since I have separate data files I've saved then into one folder and I'm reading them in separately into the function. The code is below. #F= fast; S= slow; I1= Intermediate score 1; I2=Intermediate score 2 filename<-list.files(pattern="*.txt") filename corloc<- function(x){ x<-read.table(filename[x], sep="\t", header=TRUE) #will extract the relevant data file from folder 1998. ex. corloc(1) will return 1998 breeding year data x[,"ForS"]<-0 #new column for (i in length(x$CORLOC)){ #this is the bit that I'm having a problem with since it's not assigning the appropriate letter into the "ForS" column ifelse(x$COR_LOC>=3 | x$COR_LOC<15.230, ForS<-"S", ifelse(x$COR_LOC>=15.230 | x$COR_LOC<19.810, ForS<-"I1", ifelse(x$COR_LOC>=19.810 | x$COR_LOC<25.540, FS<-"I2",ForS<-"F")))} print(x) } I've tried some of the solutions on stackoverflow but still was unsuccessful. Best, Monaly. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ternary Plots Do Not Display Ellipses in PDF
I haven't looked at the source so I don't know exactly what is going on, but I think I have a work around. While running your example I noticed that ellipse() does not just add the ellipse to the plot produced by plot(), it replots the figure. However, just running ellipse() without plot() generates an error "Error in if (coorgeo == "acomp") { : argument is of length zero" so ellipse needs the plot environment produced by plot(). Moving the pdf() file works on my Windows machine: > plot(winters.acomp, main="Winters Creek", cex=0.5) > pdf("winters-pdf.pdf") > ellipses(mean=mn, var=vr, r=r, steps=72, thinRatio=NULL, aspanel=FALSE, + col='red', lwd=2) > dev.off() - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rich Shepard Sent: Tuesday, October 14, 2014 4:21 PM To: r-help@r-project.org Subject: [R] Ternary Plots Do Not Display Ellipses in PDF A rather strange situation here and I've not found the source of the problem. The point is to print a ternary plot matrix of compositional data with ellipses enclosing 95% of the variance in each plot. The ellipses display on the monitor, dev = x11cairo (see attached winters-x11cairo.pdf), but not when sent directly to a file, dev = pdf (see attached winters-pdf.pdf). Here's winters.acomp: structure(c(0.0667, 0.0612244897959184, 0.0434782608695652, 0.043956043956044, 0.05, 0.0161290322580645, 0.6, 0.571428571428571, 0.623188405797101, 0.593406593406593, 0.433, 0.629032258064516, 0.0667, 0.0612244897959184, 0.101449275362319, 0.0659340659340659, 0.0667, 0.032258064516129, 0.244, 0.26530612244898, 0.217391304347826, 0.263736263736264, 0.367, 0.290322580645161, 0.0222, 0.0408163265306122, 0.0144927536231884, 0.032967032967033, 0.0833, 0.032258064516129), .Dim = c(6L, 5L), .Dimnames = list( NULL, c("filter", "gather", "graze", "predate", "shred")), class = "acomp") And this is the command sequence: > library(compositions) > plot(winters.acomp, main="Winters Creek", cex=0.5) > r <- sqrt(qchisq(p=0.95, df=4)) > mn <- mean(winters.acomp) > vr <- var(winters.acomp) > plot(winters.acomp, main="Winters Creek", cex=0.5) > ellipses(mean=mn, var=vr, r=r, steps=72, thinRatio=NULL, aspanel=FALSE, col='red', lwd=2) # monitor plot window is manually closed. > pdf("winters-pdf.pdf") > plot(winters.acomp, main="Winters Creek", cex=0.5) > ellipses(mean=mn, var=vr, r=r, steps=72, thinRatio=NULL, aspanel=FALSE, col='red', lwd=2) > dev.off() What am I not seeing here that causes the different outputs? Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Storing vectors as vectors in a list without losing each individual vector
If you just want to plot the various combinations of a set of variables/columns, you don't need a list, just another data frame/matrix with the combinations of the column numbers you want to plot: > df <- matrix(rnorm(100), 10, 10) > df <- data.frame(df) > comb <- expand.grid(7:10, 7:10) > comb <- comb[comb[,1] < comb[,2],] > rownames(comb) <- NULL > comb Var1 Var2 178 279 389 47 10 58 10 69 10 > windows(record=TRUE) > apply(comb, 1, function(x) plot(df[,x[1]], df[,x[2]], + main=paste("Plot of", x[1], "with", x[2]))) NULL - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Patricia Seo Sent: Monday, October 13, 2014 6:28 PM To: r-help@r-project.org Subject: [R] Storing vectors as vectors in a list without losing each individual vector Hi everyone, My help request is similar to what was asked by Ken Termiso on April 18th, 2005. Link here: https://stat.ethz.ch/pipermail/r-help/2005-April/069729.html Matt Wiener answered with suggesting a vector list where you hand type each of the vectors. This is not what I want to do. What I want to do is automate the process. So, in other words creating a list through a loop. For example: My data frame is called "df" and I have four variables/vectors that are v7, v8, v9, 10. Each variable/vector is an integer (no character strings). I want to create a list called "Indexes" so that I can use this list for "for-in" loops to SEPARATELY plot each and every variable/vector. If I followed Matt Wiener's suggestion, I would input this: Indexes = list() Indexes[[1]] = df$v7 Indexes[[2]] = df$v8 Indexes[[3]] = df$v9 Indexes[[4]] = df$v10 But if I want to include more than four variable/vectors (let's say I want to include 25 of them!), I do not want to have to type all of it. If I do the following command: Indexes <- c(df$v7, df$v8, df$v9, df$v10) then I run into the same problem as Ken Termiso with having all the integers in one vector. I need to keep the variables/vectors separate. Is this just not possible in R? Any help would be great. Thank you! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cbind in a loop...better way? | summary
Actually Jeff Laake's can be made even shorter with sapply(mat_list, as.vector) David C -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Evan Cooch Sent: Thursday, October 9, 2014 7:37 AM To: Evan Cooch; r-help@r-project.org Subject: Re: [R] cbind in a loop...better way? | summary Two solutions proposed -- not entirely orthogonal, but both do the trick. Instead of nesting cbin in a loop (as I did originally -- OP, below), 1\ do.call(cbind, lapply(mat_list, as.vector)) or 2\ sapply(mat_list,function(x) as.vector(x)) Both work fine. Thanks to Jeff Laake (2) + David Carlson (1) for their suggestions. On 10/8/2014 3:12 PM, Evan Cooch wrote: > ...or some such. I'm trying to work up a function wherein the user > passes a list of matrices to the function, which then (1) takes each > matrix, (2) performs an operation to 'vectorize' the matrix (i.e., > given an (m x n) matrix x, this produces the vector Y of length m*n > that contains the columns of the matrix x, stacked below each other), > and then (3) cbinds them together. > > Here is an example using the case where I know how many matrices I > need to cbind together. For this example, 2 square (3x3) matrices: > > a <- matrix(c,0,20,50,0.05,0,0,0,0.1,0),3,3,byrow=T) > b <- matrix(c(0,15,45,0.15,0,0,0,0.2,0),3,3,byrow=T) > > I want to vec them, and then cbind them together. So, > > result <- cbind(matrix(a,nr=9), matrix(b,nr=9)) > > which yields the following: > > [,1] [,2] > [1,] 0.00 0.00 > [2,] 0.05 0.15 > [3,] 0.00 0.00 > [4,] 20.00 15.00 > [5,] 0.00 0.00 > [6,] 0.10 0.20 > [7,] 50.00 45.00 > [8,] 0.00 0.00 > [9,] 0.00 0.00 > > Easy enough. But, I want to put it in a function, where the number and > dimensions of the matrices is not specified. Something like > > Using matrices (a) and (b) from above, let > > env <- list(a,b). > > Now, a function (or attempt at same) to perform the desired operations: > > vec=function(matlist) { > > n_mat=length(matlist); > size_mat=dim(matlist[[1]])[1]; > > result=cbind() > >for (i in 1:n_mat) { > result=cbind(result,matrix(matlist[[i]],nr=size_mat^2)) > } > > return(result) > >} > > > When I run vec(env), I get the *right answer*, but I am wondering if > there is a *better* way to get there from here than the approach I use > (above). I'm not so much interested in 'computational efficiency' as I > am in stability, and flexibility. > > Thanks... > > . > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cbind in a loop...better way?
How about > do.call(cbind, lapply(env, as.vector)) [,1] [,2] [1,] 0.00 0.00 [2,] 0.05 0.15 [3,] 0.00 0.00 [4,] 20.00 15.00 [5,] 0.00 0.00 [6,] 0.10 0.20 [7,] 50.00 45.00 [8,] 0.00 0.00 [9,] 0.00 0.00 ----- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Evan Cooch Sent: Wednesday, October 8, 2014 2:13 PM To: r-help@r-project.org Subject: [R] cbind in a loop...better way? ...or some such. I'm trying to work up a function wherein the user passes a list of matrices to the function, which then (1) takes each matrix, (2) performs an operation to 'vectorize' the matrix (i.e., given an (m x n) matrix x, this produces the vector Y of length m*n that contains the columns of the matrix x, stacked below each other), and then (3) cbinds them together. Here is an example using the case where I know how many matrices I need to cbind together. For this example, 2 square (3x3) matrices: a <- matrix(c,0,20,50,0.05,0,0,0,0.1,0),3,3,byrow=T) b <- matrix(c(0,15,45,0.15,0,0,0,0.2,0),3,3,byrow=T) I want to vec them, and then cbind them together. So, result <- cbind(matrix(a,nr=9), matrix(b,nr=9)) which yields the following: [,1] [,2] [1,] 0.00 0.00 [2,] 0.05 0.15 [3,] 0.00 0.00 [4,] 20.00 15.00 [5,] 0.00 0.00 [6,] 0.10 0.20 [7,] 50.00 45.00 [8,] 0.00 0.00 [9,] 0.00 0.00 Easy enough. But, I want to put it in a function, where the number and dimensions of the matrices is not specified. Something like Using matrices (a) and (b) from above, let env <- list(a,b). Now, a function (or attempt at same) to perform the desired operations: vec=function(matlist) { n_mat=length(matlist); size_mat=dim(matlist[[1]])[1]; result=cbind() for (i in 1:n_mat) { result=cbind(result,matrix(matlist[[i]],nr=size_mat^2)) } return(result) } When I run vec(env), I get the *right answer*, but I am wondering if there is a *better* way to get there from here than the approach I use (above). I'm not so much interested in 'computational efficiency' as I am in stability, and flexibility. Thanks... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditional Data Manipulation -Cumulative Product
I think this works, at least for your example data. The function SSRuns gets the index values of the starting points and then finds the first ending point that is greater or equal. Then we cycle through the starting points and print the index values from start to stop. Those are combined into a single vector which is used to create each column of the mask for the data. SSRuns <- function(x, y, rows) { a <- which(x>0) b <- which(y>0) d <- unlist(lapply(seq_along(a), function(i) a[i]:head(b[a[i] <= b], 1))) v <- rep(0, rows) v[d] <- 1 return(v) } mask <- sapply(StartSignals[,-1], SSRuns, y=StopSignals$Stop, rows=nrow(MainData)) Results <- data.frame(Date=MainData$Date, MainData[,-1]*mask) Results Date X1 X2 X3 X4 X5 1 2014-01-01 0.00 0.00 0.00 0.00 0.00 2 2014-01-02 0.00 1.51 0.00 0.00 1.24 3 2014-01-03 0.00 0.09 0.20 0.00 0.30 4 2014-01-04 0.00 0.00 0.00 0.00 0.00 5 2014-01-05 1.04 0.00 0.00 0.00 1.23 6 2014-01-06 0.00 0.00 0.76 0.00 0.00 7 2014-01-07 0.00 0.00 1.22 0.66 0.00 8 2014-01-08 0.00 0.00 0.27 0.09 0.00 9 2014-01-09 0.00 0.00 0.00 0.00 0.00 10 2014-01-10 0.00 0.00 1.68 0.98 0.00 11 2014-01-11 0.43 0.00 1.98 1.46 0.00 12 2014-01-12 1.51 0.78 1.63 0.46 1.84 13 2014-01-13 0.26 0.34 0.34 0.97 1.13 David C -Original Message- From: Pooya Lalehzari [mailto:plalehz...@platinumlp.com] Sent: Tuesday, October 7, 2014 8:06 PM To: David L Carlson Subject: RE: [R] Conditional Data Manipulation -Cumulative Product Hi David, I also made a dput of the Expected Results in case if you want to read it in: > dput(ExpResults) structure(list(Date = c("1/1/2014", "1/2/2014", "1/3/2014", "1/4/2014", "1/5/2014", "1/6/2014", "1/7/2014", "1/8/2014", "1/9/2014", "1/10/2014", "1/11/2014", "1/12/2014", "1/13/2014"), X1 = c(0, 0, 0, 0, 1.04, 0, 0, 0, 0, 0, 0.43, 0.65, 0.17), X2 = c(0, 1.51, 0.14, 0, 0, 0, 0, 0, 0, 0, 0, 0.78, 0.27), X3 = c(0, 0, 0.2, 0, 0, 0.76, 0.93, 0.25, 0, 1.68, 3.33, 5.42, 1.84), X4 = c(0, 0, 0, 0, 0, 0, 0.66, 0.06, 0, 0.98, 1.43, 0.66, 0.64), X5 = c(0, 1.24, 0.37, 0, 1.23, 0, 0, 0, 0, 0, 0, 1.84, 2.08)), .Names = c("Date", "X1", "X2", "X3", "X4", "X5"), class = "data.frame", row.names = c(NA, -13L)) -Original Message- From: David L Carlson [mailto:dcarl...@tamu.edu] Sent: Tuesday, October 07, 2014 5:03 PM To: Pooya Lalehzari Cc: R help Subject: RE: [R] Conditional Data Manipulation -Cumulative Product More clear to read, but this is much easier to load into R. Then adding StartSignals$Date <- as.Date(StartSignals$Date, "%m/%d/%Y") MainData$Date <- as.Date(MainData$Date, "%m/%d/%Y") StopSignals$Date <- as.Date(StopSignals$Date, "%m/%d/%Y") Creates date objects out of the character strings. But what should the final result look like? For example X1 has two start dates, "2014-01-05" and "2014-01-11" and you have stop dates of "2014-01-03", "2014-01-05", "2014-01-08", and "2014-01-13". So for X1 "2014-01-05" is both a start and stop date (value 1.04) and the second start/end would be "2014-01-11" to "2014-01-13" (values .43, 1.51, .26). What do you mean by compounding? David C -Original Message- From: Pooya Lalehzari [mailto:plalehz...@platinumlp.com] Sent: Tuesday, October 7, 2014 2:59 PM To: David L Carlson Subject: RE: [R] Conditional Data Manipulation -Cumulative Product Dear David, This is the dput output but I think the previous email had it more clearly. > dput(StartSignals) structure(list(Date = c("1/1/2014", "1/2/2014", "1/3/2014", "1/4/2014", "1/5/2014", "1/6/2014", "1/7/2014", "1/8/2014", "1/9/2014", "1/10/2014", "1/11/2014", "1/12/2014", "1/13/2014"), X1 = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), X2 = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L), X3 = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L), X4 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L), X5 = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L)), .Names = c("Date", "X1", "X2", "X3", "X4", "X5"), class = "data.frame", row.names = c(NA, -13L)) > dput(MainData) structure(list(Date = c("1/1/2014", "1/2/2014", "1/3/2014", "1/4/2014", "1/5/2014", "1/6/2014", "1/7/2014", "1/8/2014", "1/9/2014", "1/10/2014", "1/11/2014", "1/12/2014", "1/13/2014"), X1
Re: [R] Conditional Data Manipulation -Cumulative Product
More clear to read, but this is much easier to load into R. Then adding StartSignals$Date <- as.Date(StartSignals$Date, "%m/%d/%Y") MainData$Date <- as.Date(MainData$Date, "%m/%d/%Y") StopSignals$Date <- as.Date(StopSignals$Date, "%m/%d/%Y") Creates date objects out of the character strings. But what should the final result look like? For example X1 has two start dates, "2014-01-05" and "2014-01-11" and you have stop dates of "2014-01-03", "2014-01-05", "2014-01-08", and "2014-01-13". So for X1 "2014-01-05" is both a start and stop date (value 1.04) and the second start/end would be "2014-01-11" to "2014-01-13" (values .43, 1.51, .26). What do you mean by compounding? David C -Original Message- From: Pooya Lalehzari [mailto:plalehz...@platinumlp.com] Sent: Tuesday, October 7, 2014 2:59 PM To: David L Carlson Subject: RE: [R] Conditional Data Manipulation -Cumulative Product Dear David, This is the dput output but I think the previous email had it more clearly. > dput(StartSignals) structure(list(Date = c("1/1/2014", "1/2/2014", "1/3/2014", "1/4/2014", "1/5/2014", "1/6/2014", "1/7/2014", "1/8/2014", "1/9/2014", "1/10/2014", "1/11/2014", "1/12/2014", "1/13/2014"), X1 = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), X2 = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L), X3 = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L), X4 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L), X5 = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L)), .Names = c("Date", "X1", "X2", "X3", "X4", "X5"), class = "data.frame", row.names = c(NA, -13L)) > dput(MainData) structure(list(Date = c("1/1/2014", "1/2/2014", "1/3/2014", "1/4/2014", "1/5/2014", "1/6/2014", "1/7/2014", "1/8/2014", "1/9/2014", "1/10/2014", "1/11/2014", "1/12/2014", "1/13/2014"), X1 = c(1.92, 0.67, 1.09, 1.81, 1.04, 1.69, 1.57, 0.5, 0, 1.31, 0.43, 1.51, 0.26), X2 = c(1.38, 1.51, 0.09, 1.33, 0.38, 1.12, 1.3, 1.75, 1.26, 1.57, 1.63, 0.78, 0.34), X3 = c(0.83, 1.21, 0.2, 1.57, 1.72, 0.76, 1.22, 0.27, 0.59, 1.68, 1.98, 1.63, 0.34), X4 = c(1.25, 0.06, 1.62, 1.68, 1.98, 1.45, 0.66, 0.09, 0.4, 0.98, 1.46, 0.46, 0.97), X5 = c(1.12, 1.24, 0.3, 1.41, 1.23, 1.99, 1.75, 1.91, 1.81, 1.79, 0.81, 1.84, 1.13)), .Names = c("Date", "X1", "X2", "X3", "X4", "X5"), class = "data.frame", row.names = c(NA, -13L)) > dput(StopSignals) structure(list(Date = c("1/1/2014", "1/2/2014", "1/3/2014", "1/4/2014", "1/5/2014", "1/6/2014", "1/7/2014", "1/8/2014", "1/9/2014", "1/10/2014", "1/11/2014", "1/12/2014", "1/13/2014"), Stop = c(0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L)), .Names = c("Date", "Stop"), class = "data.frame", row.names = c(NA, -13L)) -Original Message- From: David L Carlson [mailto:dcarl...@tamu.edu] Sent: Tuesday, October 07, 2014 3:13 PM To: Pooya Lalehzari; R help Subject: RE: [R] Conditional Data Manipulation -Cumulative Product You need to use plain text, not html in your email. Your data are scrambled (see below). It is better to send your data using the R dput() function: dput(StartSignals) dput(MainData) dput(StopSignals) - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Pooya Lalehzari Sent: Tuesday, October 7, 2014 11:55 AM To: R help Subject: [R] Conditional Data Manipulation -Cumulative Product Hello, I have three datasets StartSignals, MainData, StopSignals and need to compound the data for each variable in MainData over dates that fall between the Start and Stop signals. (Stop signals are common and the same to all X1:X5 variables). Please see sample below: The one way I was thinking of doing this project was to setup a nested "FOR" loop and go through the three data matrices. Is there a more elegant way of doing this? Thank you. StartSignals: Date X1 X2 X3 X4 X5 1/1/2014 0 0 0 0 0 1/2/2014 0 1 0 0 1 1/3/2014 0 0 1 0 0 1/4/2014 0 0 0 0 0 1/5/2014 1 0 0 0 1 1/6/2014 0 0 1 0 0 1/7/2014 0 0 0 1 0 1/8/2014 0 0 0 0 0 1/9/2014 0 0 0 0 0 1/10/2014 0 0 1 1 0 1/11/2014 1 0 0 0 0 1/12/2014 0 1 0 0 1 1/
Re: [R] Conditional Data Manipulation -Cumulative Product
You need to use plain text, not html in your email. Your data are scrambled (see below). It is better to send your data using the R dput() function: dput(StartSignals) dput(MainData) dput(StopSignals) - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Pooya Lalehzari Sent: Tuesday, October 7, 2014 11:55 AM To: R help Subject: [R] Conditional Data Manipulation -Cumulative Product Hello, I have three datasets StartSignals, MainData, StopSignals and need to compound the data for each variable in MainData over dates that fall between the Start and Stop signals. (Stop signals are common and the same to all X1:X5 variables). Please see sample below: The one way I was thinking of doing this project was to setup a nested "FOR" loop and go through the three data matrices. Is there a more elegant way of doing this? Thank you. StartSignals: Date X1 X2 X3 X4 X5 1/1/2014 0 0 0 0 0 1/2/2014 0 1 0 0 1 1/3/2014 0 0 1 0 0 1/4/2014 0 0 0 0 0 1/5/2014 1 0 0 0 1 1/6/2014 0 0 1 0 0 1/7/2014 0 0 0 1 0 1/8/2014 0 0 0 0 0 1/9/2014 0 0 0 0 0 1/10/2014 0 0 1 1 0 1/11/2014 1 0 0 0 0 1/12/2014 0 1 0 0 1 1/13/2014 0 0 0 0 0 MainData: Date X1 X2 X3 X4 X5 1/1/2014 1.92 1.38 0.83 1.25 1.12 1/2/2014 0.67 1.51 1.21 0.06 1.24 1/3/2014 1.09 0.09 0.2 1.62 0.3 1/4/2014 1.81 1.33 1.57 1.68 1.41 1/5/2014 1.04 0.38 1.72 1.98 1.23 1/6/2014 1.69 1.12 0.76 1.45 1.99 1/7/2014 1.57 1.3 1.22 0.66 1.75 1/8/2014 0.5 1.75 0.27 0.09 1.91 1/9/2014 0 1.26 0.59 0.4 1.81 1/10/2014 1.31 1.57 1.68 0.98 1.79 1/11/2014 0.43 1.63 1.98 1.46 0.81 1/12/2014 1.51 0.78 1.63 0.46 1.84 1/13/2014 0.26 0.34 0.34 0.97 1.13 StopSignals: Date Stop 1/1/2014 0 1/2/2014 0 1/3/2014 1 1/4/2014 0 1/5/2014 1 1/6/2014 0 1/7/2014 0 1/8/2014 1 1/9/2014 0 1/10/2014 0 1/11/2014 0 1/12/2014 0 1/13/2014 1 ExpectedResult: Date X1 X2 X3 X4 X5 1/1/2014 0 0 0 0 0 1/2/2014 0 1.51 0 0 1.24 1/3/2014 0 0.14 0.2 0 0.37 1/4/2014 0 0 0 0 0 1/5/2014 1.04 0 0 0 1.23 1/6/2014 0 0 0.76 0 0 1/7/2014 0 0 0.93 0.66 0 1/8/2014 0 0 0.25 0.06 0 1/9/2014 0 0 0 0 0 1/10/2014 0 0 1.68 0.98 0 1/11/2014 0.43 0 3.33 1.43 0 1/12/2014 0.65 0.78 5.42 0.66 1.84 1/13/2014 0.17 0.27 1.84 0.64 2.08 *** We are pleased to announce that, as of October 20th, 2014, we will be moving to our new office at: Platinum Partners 250 West 55th Street, 14th Floor, New York, NY 10019 T: 212.582. | F: 212.582.2424 *** THIS E-MAIL IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S) AND MAY CONTAIN CONFIDENTIAL AND PRIVILEGED INFORMATION.ANY UNAUTHORIZED REVIEW, USE, DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE INTENDED RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND DESTROY ALL COPIES OF THE ORIGINAL E-MAIL. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract table results from survival summary object
This will create a data.frame containing the results of the summary(mod) object. You can find out what that is using the command ?summary.survfit. You have an error in your example since death is not a variable in lung: > library(survival) > data(lung) > mod <- with(lung, survfit(Surv(time, status)~ 1)) > res <- summary(mod) > str(res) List of 14 $ n: int 228 $ time : num [1:139] 5 11 12 13 15 26 30 31 53 54 ... $ n.risk : num [1:139] 228 227 224 223 221 220 219 218 217 215 ... $ n.event : num [1:139] 1 3 1 2 1 1 1 1 2 1 ... $ n.censor : num [1:139] 0 0 0 0 0 0 0 0 0 0 ... $ surv : num [1:139] 0.996 0.982 0.978 0.969 0.965 ... $ type : chr "right" $ std.err : num [1:139] 0.00438 0.00869 0.0097 0.01142 0.01219 ... $ upper: num [1:139] 1 1 0.997 0.992 0.989 ... $ lower: num [1:139] 0.987 0.966 0.959 0.947 0.941 ... $ conf.type: chr "log" $ conf.int : num 0.95 $ call : language survfit(formula = Surv(time, status) ~ 1) $ table: Named num [1:7] 228 228 228 165 310 285 363 ..- attr(*, "names")= chr [1:7] "records" "n.max" "n.start" "events" ... - attr(*, "class")= chr "summary.survfit" > # Extract the columns you want > cols <- lapply(c(2:6, 8:10) , function(x) res[x]) > # Combine the columns into a data frame > tbl <- do.call(data.frame, cols) > str(tbl) 'data.frame': 139 obs. of 8 variables: $ time: num 5 11 12 13 15 26 30 31 53 54 ... $ n.risk : num 228 227 224 223 221 220 219 218 217 215 ... $ n.event : num 1 3 1 2 1 1 1 1 2 1 ... $ n.censor: num 0 0 0 0 0 0 0 0 0 0 ... $ surv: num 0.996 0.982 0.978 0.969 0.965 ... $ std.err : num 0.00438 0.00869 0.0097 0.01142 0.01219 ... $ upper : num 1 1 0.997 0.992 0.989 ... $ lower : num 0.987 0.966 0.959 0.947 0.941 ... Since res is a list containing the columns you want plus other information, we need to extract the needed columns from res and then combine those columns into a data.frame. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Juan Andres Hernandez Sent: Tuesday, October 7, 2014 8:41 AM To: r-help@r-project.org Subject: [R] How to extract table results from survival summary object Hi. I need to extract the "matrix" or "data.frame" results from a survival object. library(survival) data(lung) mod=with(lung, survfit(Surv(time,death)~ 1)) res=summary(mod) res show in consola the "matrix" I am looking for, but I can't find the way to save or assign this table to an object. Anyone knows how to solve it. Thank's in advance Juan A. Hernández [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using PCA to filter a series
You can reconstruct the data from the first component. Here's an example using singular value decomposition on the original data matrix: > d <- cbind(d1, d2, d3, d4) > d.svd <- svd(d) > new <- d.svd$u[,1] * d.svd$d[1] new is basically your cp1. If we multiply it by each of the loadings, we can create reconstructed values based on the first component: > dnew <- sapply(d.svd$v[,1], function(x) new * x) > round(head(dnew), 1) [,1] [,2] [,3] [,4] [1,] 119.3 134.1 135.7 134.6 [2,] 104.2 117.2 118.6 117.6 [3,] 109.7 123.3 124.8 123.8 [4,] 109.3 122.9 124.3 123.3 [5,] 105.8 119.0 120.4 119.4 [6,] 111.5 125.4 126.9 125.8 > head(d) d1 d2 d3 d4 [1,] 113 138 138 134 [2,] 108 115 120 115 [3,] 105 127 129 120 [4,] 103 127 129 120 [5,] 109 119 120 117 [6,] 115 126 126 123 > diag(cor(d, dnew)) [1] 0.9233742 0.9921703 0.9890085 0.9910287 Since you want a single variable to stand for all four, you could scale new to the mean: > newd <- new*mean(d.svd$v[,1]) > head(newd) [1] 130.9300 114.3972 120.3884 119.9340 116.1588 122.3983 ----- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: Jonathan Thayn [mailto:jth...@ilstu.edu] Sent: Thursday, October 2, 2014 11:11 PM To: David L Carlson Cc: r-help@r-project.org Subject: Re: [R] Using PCA to filter a series I suppose I could calculate the eigenvectors directly and not worry about centering the time-series, since they essentially the same range to begin with: vec <- eigen(cor(cbind(d1,d2,d3,d4)))$vector cp <- cbind(d1,d2,d3,d4)%*%vec cp1 <- cp[,1] I guess there is no way to reconstruct the original input data using just the first component, though, is there? Not the original data in it entirety, just one time-series that we representative of the general pattern. Possibly something like the following, but with just the first component: o <- cp%*%solve(vec) Thanks for your help. It's been a long time since I've played with PCA. Jonathan Thayn On Oct 2, 2014, at 4:59 PM, David L Carlson wrote: > I think you want to convert your principal component to the same scale as d1, > d2, d3, and d4. But the "original space" is a 4-dimensional space in which > d1, d2, d3, and d4 are the axes, each with its own mean and standard > deviation. Here are a couple of possibilities > > # plot original values for comparison >> matplot(cbind(d1, d2, d3, d4), pch=20, col=2:5) > # standardize the pc scores to the grand mean and sd >> new1 <- scale(pca$scores[,1])*sd(c(d1, d2, d3, d4)) + mean(c(d1, d2, d3, d4)) >> lines(new1) > # Use least squares regression to predict the row means for the original four > variables >> new2 <- predict(lm(rowMeans(cbind(d1, d2, d3, d4))~pca$scores[,1])) >> lines(new2, col="red") > > - > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Don McKenzie > Sent: Thursday, October 2, 2014 4:39 PM > To: Jonathan Thayn > Cc: r-help@r-project.org > Subject: Re: [R] Using PCA to filter a series > > > On Oct 2, 2014, at 2:29 PM, Jonathan Thayn wrote: > >> Hi Don. I would like to "de-rotate� the first component back to its original >> state so that it aligns with the original time-series. My goal is to create >> a �cleaned�, or a �model� time-series from which noise has been removed. > > Please cc the list with replies. It�s considered courtesy plus you�ll get > more help that way than just from me. > > Your goal sounds almost metaphorical, at least to me. Your first axis > �aligns� with the original time series already in that it captures the > dominant variation > across all four. Beyond that, there are many approaches to signal/noise > relations within time-series analysis. I am not a good source of help on > these, and you probably need a statistical consult (locally?), which is not > the function of this list. > >> >> >> Jonathan Thayn >> >> >> >> On Oct 2, 2014, at 2:33 PM, Don McKenzie wrote: >> >>> >>> On Oct 2, 2014, at 12:18 PM, Jonathan Thayn wrote: >>> >>>> I have four time-series of similar data. I would like to combine these >>>> into a single, clean time-series. I could simply find the mean of each >>>> time period, but I think that using principal components analysis should >>>> extract the most salient pattern and ignore some of the noise. I can >>>> compute components using princomp >
Re: [R] Using PCA to filter a series
I think you want to convert your principal component to the same scale as d1, d2, d3, and d4. But the "original space" is a 4-dimensional space in which d1, d2, d3, and d4 are the axes, each with its own mean and standard deviation. Here are a couple of possibilities # plot original values for comparison > matplot(cbind(d1, d2, d3, d4), pch=20, col=2:5) # standardize the pc scores to the grand mean and sd > new1 <- scale(pca$scores[,1])*sd(c(d1, d2, d3, d4)) + mean(c(d1, d2, d3, d4)) > lines(new1) # Use least squares regression to predict the row means for the original four variables > new2 <- predict(lm(rowMeans(cbind(d1, d2, d3, d4))~pca$scores[,1])) > lines(new2, col="red") --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Don McKenzie Sent: Thursday, October 2, 2014 4:39 PM To: Jonathan Thayn Cc: r-help@r-project.org Subject: Re: [R] Using PCA to filter a series On Oct 2, 2014, at 2:29 PM, Jonathan Thayn wrote: > Hi Don. I would like to "de-rotate� the first component back to its original > state so that it aligns with the original time-series. My goal is to create a > �cleaned�, or a �model� time-series from which noise has been removed. Please cc the list with replies. It�s considered courtesy plus you�ll get more help that way than just from me. Your goal sounds almost metaphorical, at least to me. Your first axis �aligns� with the original time series already in that it captures the dominant variation across all four. Beyond that, there are many approaches to signal/noise relations within time-series analysis. I am not a good source of help on these, and you probably need a statistical consult (locally?), which is not the function of this list. > > > Jonathan Thayn > > > > On Oct 2, 2014, at 2:33 PM, Don McKenzie wrote: > >> >> On Oct 2, 2014, at 12:18 PM, Jonathan Thayn wrote: >> >>> I have four time-series of similar data. I would like to combine these >>> into a single, clean time-series. I could simply find the mean of each time >>> period, but I think that using principal components analysis should extract >>> the most salient pattern and ignore some of the noise. I can compute >>> components using princomp >>> >>> >>> d1 <- c(113, 108, 105, 103, 109, 115, 115, 102, 102, 111, 122, 122, 110, >>> 110, 104, 121, 121, 120, 120, 137, 137, 138, 138, 136, 172, 172, 157, 165, >>> 173, 173, 174, 174, 119, 167, 167, 144, 170, 173, 173, 169, 155, 116, 101, >>> 114, 114, 107, 108, 108, 131, 131, 117, 113) >>> d2 <- c(138, 115, 127, 127, 119, 126, 126, 124, 124, 119, 119, 120, 120, >>> 115, 109, 137, 142, 142, 143, 145, 145, 163, 169, 169, 180, 180, 174, 181, >>> 181, 179, 173, 185, 185, 183, 183, 178, 182, 182, 181, 178, 171, 154, 145, >>> 147, 147, 124, 124, 120, 128, 141, 141, 138) >>> d3 <- c(138, 120, 129, 129, 120, 126, 126, 125, 125, 119, 119, 122, 122, >>> 115, 109, 141, 144, 144, 148, 149, 149, 163, 172, 172, 183, 183, 180, 181, >>> 181, 181, 173, 185, 185, 183, 183, 184, 182, 182, 181, 179, 172, 154, 149, >>> 156, 156, 125, 125, 115, 139, 140, 140, 138) >>> d4 <- c(134, 115, 120, 120, 117, 123, 123, 128, 128, 119, 119, 121, 121, >>> 114, 114, 142, 145, 145, 144, 145, 145, 167, 172, 172, 179, 179, 179, 182, >>> 182, 182, 182, 182, 184, 184, 182, 184, 183, 183, 181, 179, 172, 149, 149, >>> 149, 149, 124, 124, 119, 131, 135, 135, 134) >>> >>> >>> pca <- princomp(cbind(d1,d2,d3,d4)) >>> plot(pca$scores[,1]) >>> >>> This seems to have created the clean pattern I want, but I would like to >>> project the first component back into the original axes? Is there a simple >>> way to do that? >> >> Do you mean that you want to scale the scores on Axis 1 to the mean and >> range of your raw data? Or their mean and variance? >> >> See >> >> ?scale >>> >>> >>> >>> >>> Jonathan B. Thayn >>> >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> Don McKenzie >> Research Ecologist >> Pacific WIldland Fire Sciences Lab >> US Forest Service >> >> Affiliate P
Re: [R] Converting factor data into Date-time format
First, use stringsAsFactors=FALSE with the read.csv() function. That will prevent the conversion to factors. Then try to convert date and time to datetime objects. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of tandi perkins Sent: Tuesday, September 30, 2014 12:55 PM To: r-help@r-project.org Subject: [R] Converting factor data into Date-time format Hello R help: I am new to this forum so I apologize in advance for any protocol missteps. I have a data set that is comprised of eight birds with GPS; each of which transmit everyday at 8:00 am, 4:00 pm, and midnight for 1 year (although I have some missing relocation's). I am trying to format my data to be run in adehabitatLT but I am unsuccessful. I have a "csv" file with the following header: "Craneid, Date, Time, Long, Lat, Habitat, BurstID". R creates factor levels in the all of the data except Lat, Long. I have attempted the following to correctly format my date and time factors (data=l10): First attempt: 1. datetime=as.POSIXct(paste(l10$Date, l10$Time), format="%m/%d/%Y %H:%M:%S", "America/Chicago") 2. coord=data.frame((l10$Longitude), (l10$Latitude)) 3. test=as.ltraj(coord, datetime, l10$Craneid, burst=l10$ID, typeII=TRUE) Results:Error in as.ltraj(coord, datetime, l10$Craneid, burst = l10$ID, typeII = TRUE) : non unique dates for a given burst I researched this error on the list serve and found that I could have duplicates so I checked for duplicates in datetime and the return was NULL (I also check for duplicates in Excel as I am in the learning stages in R). Next I read a thread posted on the R help in 2012 with a similar problem so I attempted what was suggested as follows: 1. datetime=as.POSIXct(strptime(as.character(l10$Date, l10$Time), format="%m/%d/%Y %H:%M:%S")) 2.test=as.ltraj(coord, datetime, l10$Craneid, burst=l10$ID, typeII=TRUE) Results: Same error. Finally, I have tried: 1. datetime=as.POSIXct(as.character(levels(l10$Date)(l10$Time)), format="%m/%d/%Y %H:%M:%S")[l10$Date][l10$Time] Results:Error in as.POSIXct(as.character(levels(l10$Date)(l10$Time)), format = "%m/%d/%Y %H:%M:%S") : attempt to apply non-function Can someone please explain what I am doing wrong? My goal is to obtain trajectories for all birds using each bird as a burst as is detailed in the adehabitatLT manual and then to create Bias Random Bridges for each bird. I did not include my data but I can if that will be helpful. Thank you in advance for your help, TLP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows
Another approach fun <- function(i, dat=x) { grp <- rep(1:(nrow(dat)/i), each=i) aggregate(dat[1:length(grp),]~grp, FUN=sum) } lapply(2:6, fun, dat=TT) - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas Sent: Thursday, September 25, 2014 3:34 PM To: eliza botto; r-help@r-project.org Subject: Re: [R] adding rows Hello, Try the following. fun <- function(x, r){ if(r > 0){ m <- length(x) %/% r y <- numeric(m) for(i in seq_len(m)){ y[i] <- sum(x[((i - 1)*r + 1):(i*r)]) } y }else{ NULL } } apply(TT, 2, fun, r = 2) apply(TT, 2, fun, r = 3) etc Hope this helps, Rui Barradas Em 25-09-2014 20:50, eliza botto escreveu: > Dear useRs, > Here is my data with two columns and 20 rows. >> dput(TT) > structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, > 19, 20, 24, 48, 72, 96, 120, 144, 168, 192, 216, 240, 264, 288, 312, 336, > 360, 384, 408, 432, 456, 480), .Dim = c(20L, 2L), .Dimnames = list(NULL, > c("", "SS"))) > I first of all want to sum up continuously two rows (1 & 2, 3 & 4, 5 & 6 and > so on) of each column. > Then I want to sum up 3 rows as (1-2-3,4-5-6,. 16-17-18) and since 19th > and 20th rows do not up 3 rows, so they should be ignored. > Similarly with 4 sets of rows and 5 sets of rows and even 6. > I hope I was clear. > Thankyou so very much in advance, > Eliza > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cluster -- Agnes function
Read the documentation for cutree(). You will have to decide how many clusters you want to use since agnes() provides results for everything from n clusters (where n is the number of observations) to 1 cluster. ?cutree - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sohail Khan Sent: Wednesday, September 24, 2014 9:14 AM To: r-help@r-project.org Subject: [R] Cluster -- Agnes function Dear All, I have clustered a patient data set by agnes. I want to extract information for each cluster, I.E. all row ids belonging to each cluster. Thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Copying tables from R to Excel
If you looked at the documentation for R2HTML you might have noticed that there is no function HTML.matrix. Perhaps your recommendation from an unnamed source is out of date? Assuming you loaded the package with library(R2HTML) as Ivan suggested, the command would be HTML( summary(iris), file("clipboard", "w"), append=F ) Which will work just fine as long as you are using the Windows operating system. More technically, HTML() is a generic function with methods (156 in this case) for many different data types including matrices and tables. --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: Tuesday, September 23, 2014 8:12 AM To: r-help@r-project.org Subject: Re: [R] Copying tables from R to Excel library(R2HTML) ?? Le 23/09/14 15:04, Angel Rodriguez a écrit : > Dear Subscribers, > > I've found this recommendation to paste an R table to Excel: > > HTML.matrix( summary(iris), file("clipboard", "w"), append=F ) > # paste into Excel > > After installing R2HTML and writting that command, I get: > > Error: could not find function "HTML.matrix" > > Any clue? > > Thank you very much, > > Angel Rodr�guez-Laso > > [[alternative HTML version deleted]] > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pseudo R squared for quantile regression with replicates
It is hard to say because we do not have enough information. R has approximately 6,000 packages and you have not told us which ones you are using. You have not told us much about your data and you have not told us where to find the query from August 2006. The basic problem is that your "fit" is not the same as the "f" in the query. Your fit object is not very complicated. If you look at the output from str(fit) you will see that fit is an "atomic" vector (note the wording in your error message) with a series of attributes that are probably documented in the help pages for the functions you are using. There is nothing called resid inside fit. It is likely that the post you are looking at refers to the output from rq(...) or perhaps predict(rq(...)), but not the output from withReplicates(..., quote(coef(rq(... which is what fit is. --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Donia Smaali Bouhlila Sent: Thursday, September 18, 2014 9:54 AM To: r-help@r-project.org Subject: [R] Pseudo R squared for quantile regression with replicates Hi, I am a new user of r software. I intend to do quantile regressions with complex survey data using replicate method. I have ran the following commands successfully: mydesign <-svydesign(ids=~IDSCHOOL,strata=~IDSTRATE,data=TUN,nest=TRUE,weights=~TOTWGT) bootdesign <- as.svrepdesign(mydesign,type="auto",replicates=150) fit<- withReplicates(bootdesign,quote(coef(rq(Math1~Female+Age+calculator+computer+desk+ + dictionary+internet+work+Book2+Book3+Book4+Book5+Pedu1+Pedu2+Pedu3+Pedu4+Born1+Born2,tau=0.5,weights=.weights, method="fn" I want get the pseudo R squared but I failed. I read a query dating from August 2006, [R] Pseudo R for Quant Reg and the answer to it: rho <- function(u,tau=.5)u*(tau - (u < 0)) V <- sum(rho(f$resid, f$tau)) I copied it and paste it , replacing f by fit I get this error message: Error in fit$resid : $ operator is invalid for atomic vectors, I don't know what it means The fit object is likely to be quite complicated I used str() to see what it looks like: str (fit) Class 'svrepstat' atomic [1:19] 713.24 -24.01 -18.37 9.05 7.71 ... ..- attr(*, "var")= num [1:19, 1:19] 2839.3 10.2 -122.1 -332.4 -42.3 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:19] "(Intercept)" "Female" "Age" "calculator" ... .. .. ..$ : chr [1:19] "(Intercept)" "Female" "Age" "calculator" ... .. ..- attr(*, "means")= Named num [1:19] 710.97 -24.03 -18.3 9.39 7.58 ... .. .. ..- attr(*, "names")= chr [1:19] "(Intercept)" "Female" "Age" "calculator" ... ..- attr(*, "statistic")= chr "theta" How can I retrieve the residuals?? and calculate the pseudo R squared?? Any help please -- Dr. Donia Smaali Bouhlila Associate-Professor Department of Economics Faculté des Sciences Economiques et de Gestion de Tunis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] column names to row names
Here's another approach using stack(): > y <- data.frame(y) > E <- with(y, data.frame(year, month, day, stack(data.frame(y), select=4:12))) > colnames(E)[4:5] <- c("discharge", "station") But there are some differences. For my E: > str(E) 'data.frame': 36 obs. of 5 variables: $ year : num 1961 1961 1961 1961 1961 ... $ month: num 1 1 1 1 1 1 1 1 1 1 ... $ day : num 1 2 3 4 1 2 3 4 1 2 ... $ discharge: num 1 2 3 4 5 6 7 8 9 10 ... $ station : Factor w/ 9 levels "A","B","C","D",..: 1 1 1 1 2 2 2 2 3 3 ... But for your E: > str(E) 'data.frame': 36 obs. of 5 variables: $ year : Factor w/ 1 level "1961": 1 1 1 1 1 1 1 1 1 1 ... $ month: num 1 1 1 1 1 1 1 1 1 2 ... $ day : int 1 2 3 4 1 2 3 4 1 2 ... $ discharge: Factor w/ 36 levels "1","10","11",..: 1 12 23 31 32 33 34 35 36 2 ... $ station : chr "A" "A" "A" "A" ... It seems strange that the discharge and year would be factors and station would be character. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of jim holtman Sent: Wednesday, September 17, 2014 8:26 AM To: eliza botto Cc: r-help@r-project.org Subject: Re: [R] column names to row names Use the 'tidyr' package: your 'month' does not match your desired output - > x <- structure(c(1961, 1961, 1961, 1961, 1, 1, 1, 1, 1, 2, 3 + , 4, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 + , 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27 + , 28, 29, 30, 31, 32, 33, 34, 35, 36) + , .Dim = c(4L, 12L) + , .Dimnames = list(NULL, c("year", "month", "day", "A", "B", "C" + , "D", "E", "F", "G", "H", "I")) + ) > xdf <- as.data.frame(x) > xdf year month day A B C D E F G H I 1 1961 1 1 1 5 9 13 17 21 25 29 33 2 1961 1 2 2 6 10 14 18 22 26 30 34 3 1961 1 3 3 7 11 15 19 23 27 31 35 4 1961 1 4 4 8 12 16 20 24 28 32 36 > require(tidyr) > require(dplyr) > xdf %>% gather(station, discharge, -year, -month, -day) year month day station discharge 1 1961 1 1 A 1 2 1961 1 2 A 2 3 1961 1 3 A 3 4 1961 1 4 A 4 5 1961 1 1 B 5 6 1961 1 2 B 6 7 1961 1 3 B 7 8 1961 1 4 B 8 9 1961 1 1 C 9 10 1961 1 2 C10 11 1961 1 3 C11 12 1961 1 4 C12 13 1961 1 1 D13 14 1961 1 2 D14 15 1961 1 3 D15 16 1961 1 4 D16 17 1961 1 1 E17 18 1961 1 2 E18 19 1961 1 3 E19 20 1961 1 4 E20 21 1961 1 1 F21 22 1961 1 2 F22 23 1961 1 3 F23 24 1961 1 4 F24 25 1961 1 1 G25 26 1961 1 2 G26 27 1961 1 3 G27 28 1961 1 4 G28 29 1961 1 1 H29 30 1961 1 2 H30 31 1961 1 3 H31 32 1961 1 4 H32 33 1961 1 1 I33 34 1961 1 2 I34 35 1961 1 3 I35 36 1961 1 4 I36 > Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Wed, Sep 17, 2014 at 8:28 AM, eliza botto wrote: > Dear useRs, > I have a data frame "y" starting from 1961 to 2010 in the following manner > (where A,B,C .., I are station names and the values uder these are > "discharge" values.) >> dput(y) > structure(c(1961, 1961, 1961, 1961, 1, 1, 1, 1, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, > 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, > 27, 28, 29, 30, 31, 32, 33, 34, 35, 36), .Dim = c(4L, 12L), .Dimnames = > list(NULL, c("year", "month", "day", "A", "B", "C", "D", "E", "F", "G", "H", > "I"))) > > I want it to be in the following manner "E" where the stations names are in a > seperate column and all discharge values are in one column. >> dput(E) > > structure(list(year = structure(c(1L, 1L, 1L, 1L, 1
Re: [R] chi-square test
Rick's question is a good one. It is unlikely that the results will be informative, but from a technical standpoint, you can estimate the p value using the simulate.p.value=TRUE argument to chisq.test(). > chisq.test(TT, simulate.p.value=TRUE) Pearson's Chi-squared test with simulated p-value (based on 2000 replicates) data: TT X-squared = 7919.632, df = NA, p-value = 0.0004998 ----- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rick Bilonick Sent: Monday, September 15, 2014 10:18 AM To: r-help@r-project.org Subject: Re: [R] chi-square test On 09/15/2014 10:57 AM, eliza botto wrote: > Dear useRs of R, > I have two datasets (TT and SS) and i wanted to to see if my data is > uniformly distributed or not?I tested it through chi-square test and results > are given at the end of it.Now apparently P-value has a significant > importance but I cant interpret the results and why it says that "In > chisq.test(TT) : Chi-squared approximation may be incorrect" > ### >> dput(TT) > structure(list(clc5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.26, 0.14, 0, 0.44, > 0.26, 0, 0, 0, 0, 0, 0, 0.11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.16, > 0.56, 0, 1.49, 0, 0.64, 0.79, 0.66, 0, 0, 0.17, 0, 0, 0, 0, 0.56, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0.43, 0.41, 0, 0.5, 0.44, 0, 0, 0, 0, 0.09, 0.46, 0, > 0.27, 0.45, 0.15, 0.31, 0.16, 0.44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.07, 0, 0, > 0, 0, 0, 0.06, 0, 0.09, 0.07, 0, 0, 7.89, 0, 0.22, 0.29, 0.33, 0.27, 0, 0.36, > 0.41, 0, 0, 0, 0, 0.55, 0.81, 0, 0.09, 0.13, 0.28, 0, 0, 0), quota_massima = > c(1167L, 1167L, 4572L, 3179L, 3141L, 585L, 585L, 876L, 876L, 1678L, 2667L, > 1369L, 1369L, 1369L, 1381L, 1381L, 1381L, 1381L, 2284L, 410L, 2109L, 2507L, > 2579L, 2507L, 1436L, 3234L, 3234L, 3234L, 3234L, 2792L, 2569L, 2569L, 2569L, > 1669L, 4743L, 4743L, 4743L, 3403L, 3197L, 3267L, 3583L, 3583L, 3583L, 2584L, > 2584L, 2579L, 1241L, 1241L, 4174L, 3006L, 3197L, 2366L, 2618L, 2670L, 4487L, > 3196L, 3196L, 2107L, 2107L, 2427L, 1814L, 2622L, 1268L, 1268L, 1268! L,! >3885L, 3885L, 3092L, 3234L, 2625L, 2625L, 3760L, 4743L, 3707L, 3760L, > 4743L, 3760L, 3885L, 3760L, 4743L, 2951L, 782L, 2957L, 3343L, 2697L, 2697L, > 3915L, 2277L, 1678L, 1678L, 3197L, 2957L, 2957L, 2957L, 4530L, 4530L, 4530L, > 2131L, 3618L, 3618L, 3335L, 2512L, 2390L, 1616L, 3526L, 3197L, 3197L, 2625L, > 2622L, 3197L, 3197L, 2622L, 2622L, 2622L, 368L, 4572L, 3953L, 863L, 3716L, > 3716L, 3716L, 2697L, 2697L, 1358L)), .Names = c("clc5", "quota_massima"), > class = "data.frame", row.names = c(NA, -124L)) > >> chisq.test(TT) > Pearson's Chi-squared test > data: TT > X-squared = 411.5517, df = 123, p-value < 2.2e-16 > Warning message: > In chisq.test(TT) : Chi-squared approximation may be incorrect > ### >> dput(SS) > structure(list(NDVIanno = c(0.57, 0.536, 0.082, 0.262, 0.209, 0.539, 0.536, > 0.543, 0.588, 0.599, 0.397, 0.63, 0.616, 0.644, 0.579, 0.597, 0.617, 0.622, > 0.548, 0.528, 0.541, 0.436, 0.509, 0.467, 0.534, 0.412, 0.324, 0.299, 0.41, > 0.462, 0.427, 0.456, 0.508, 0.581, 0.242, 0.291, 0.324, 0.28, 0.291, 0.305, > 0.365, 0.338, 0.399, 0.516, 0.357, 0.558, 0.605, 0.638, 0.191, 0.377, 0.325, > 0.574, 0.458, 0.426, 0.188, 0.412, 0.464, 0.568, 0.582, 0.494, 0.598, 0.451, > 0.577, 0.572, 0.602, 0.321, 0.38, 0.413, 0.427, 0.55, 0.437, 0.481, 0.425, > 0.234, 0.466, 0.464, 0.491, 0.463, 0.489, 0.435, 0.267, 0.564, 0.256, 0.156, > 0.476, 0.498, 0.122, 0.508, 0.582, 0.615, 0.409, 0.356, 0.284, 0.285, 0.444, > 0.303, 0.478, 0.557, 0.345, 0.408, 0.347, 0.498, 0.534, 0.576, 0.361, 0.495, > 0.502, 0.553, 0.519, 0.504, 0.53, 0.547, 0.559, 0.505, 0.557, 0.377, 0.36, > 0.613, 0.452, 0.397, 0.277, 0.42, 0.443, 0.62), delta_z = c(211L, 171L, 925L, > 534L, 498L, 50L, 53L, 331L, 135L, 456L, 850L, 288L, 286L, 233L, 342L, ! 27! > 4L, 184L, 198L, 312L, 67L, 476L, 676L, 349L, 873L, 65L, 963L, 553L, 474L, > 948L, 1082L, 616L, 704L, 814L, 450L, 865L, 987L, 1265L, 720L, 565L, 652L, > 941L, 822L, 1239L, 929L, 477L, 361L, 199L, 203L, 642L, 788L, 818L, 450L, > 703L, 760L, 711L, 1015L, 1351L, 195L, 511L, 617L, 296L, 604L, 381L, 389L, > 287L, 1043L, 1465L, 963L, 1125L, 582L, 662L, 1424L, 1762L, 575L, 1477L, > 1364L, 1236L, 1483L, 1201L, 1644L, 498L, 142L, 510L, 482L, 811L, 788L, 466L, > 626L, 461L, 350L, 1177L, 826L, 575L, 568L, 916L, 767L, 1017L, 532L, 1047L, > 1370L, 902L, 686L, 703L, 440L, 1016L, 1148L, 1089L, 753L, 65
Re: [R] apply block of if statements with menu function
I think switch() should work for you here, but it is not clear how much flexibility you are trying to have (different tests based on the first response; different tests based on first, then second response; different tests based on each successive response). ?switch For the second question just index the return value: > let <- letters[1:4] > let[menu(let)] 1: a 2: b 3: c 4: d Selection: 3 [1] "c" Or a bit more polished: > cat("Choice: ", let[menu(let)], "\n") 1: a 2: b 3: c 4: d Selection: 4 Choice: d --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of r...@openmailbox.org Sent: Monday, September 15, 2014 3:53 AM To: r-help@r-project.org Subject: [R] apply block of if statements with menu function Subscribers, apply block of if statements with menu function Subscribers, For a menu: menu(c('a','b','c','d')) How to create a function that will apply to specific menu choice objects? For example: object1<-function (menuifchoices) { menu1<-menu(c('a','b','c','d')) if (menu1==1) ... menu1a<-menu... if (menu1a==1) ... menu2a<-menu... if (menu2a==1) ... menu2 <-menu(c('a','b','c','d')) if (menu1==2) ... } The request action is that a user can select a menu option that will activate a series of "multiple choice" questions, results in "menu1" being activated without menu2 being activated. If someone could direct to the relevant terminology, thank you. Separate question; for a menu: menu(c('a','b','c','d')) 1: a 2: b 3: c 4: d Selection: 1 [1] 1 is it possible to change behaviour so that result of the selection is not the integer, but the original menu choice: Selection: 1 [1] a __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mice - undefined columns selected
I'm copying the package maintainer who can probably give a more definite answer. I'm getting the same error on your data. I can get a subset of your data to run, eg: d.imp <- mice(d[,c(1:2, 5:6)]) works, but d.imp <- mice(d[,c(3:4, 7:8)]) fails. That suggests to me that the problem is with your data. There are some very high correlations between variables. Looking at pairwise complete observations, C1 has correlations of .998, .999, and .998 with C2, C3, and C4 while M1 has correlations of .999, .999, and .999 with M2, M3, and M4. The correlations between the C variables and the M variables are also high (consistently greater than .80). You really have only two variables C and M. This is probably the reason function mice() is failing, but the error message could be more informative. Since you are only imputing single values, you might be better off with simpler imputation methods. Package VIM has a number of options of which nearest neighbor and hot deck might work well with your data. --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jeremy Miles Sent: Thursday, September 11, 2014 7:49 PM To: r-help Subject: [R] mice - undefined columns selected I've got a problem with the mice package that I don't understand. Here's the code: library(mice) d <- read.csv("https://dl.dropboxusercontent.com/u/24381951/employment.csv";, as.is=TRUE, row.names=1)d.imp <- mice(data=d, m=1) Result is: Error in `[.data.frame`(data, , jj) : undefined columns selected I hope I'm doing something foolish, thanks, Jeremy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Margins to fill matrix
You want r2dtable(): > ?r2dtable > set.seed(42) > a <- r2dtable(1, seats, mandates) addmargins(a[[1]]) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,]23106212 17 [2,]8011 11021 24 [3,]80527114 28 [4,] 105316302 30 [5,] 134149021 34 [6,]8220 17340 36 [7,] 130269235 40 [8,] 12443 12333 44 [9,] 14332 18042 46 [10,] 19220 17550 50 [11,] 107 23 24 19 112 19 25 20 349 --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Stefan Petersson Sent: Thursday, September 11, 2014 7:13 AM To: Charles Determan Jr Cc: r-help@r-project.org Subject: Re: [R] Margins to fill matrix I have : rs <- c(3, 2, 3, 4) cs <- c(4, 5, 3) And want: > matrix [,1] [,2] [,3] [1,] 120 [2,] 101 [3,] 111 [4,] 121 The rowSums in the above matrix is equal to sum(rs) and colSums is equal to sum(cs). It's sort of a matrix expansion where the margins are known beforehand... I hope I make sense. 2014-09-11 14:09 GMT+02:00 Charles Determan Jr : > Do you have an example of what you would like your output to look like? It > is a little difficult to fully understand what you are looking for. You > only have 18 values but are looking to fill at 10x8 matrix (i.e. 80 values). > If you can clarify better we may be better able to help you. > > Charles > > > On Thu, Sep 11, 2014 at 3:47 AM, Stefan Petersson wrote: >> >> Hi, >> >> I have two vector of margins. Now I want to create "fill" matrix that >> reflects the margins. >> >> seats <- c(17,24,28,30,34,36,40,44,46,50) >> mandates <- c(107,23,24,19,112,19,25,20) >> >> Both vectors adds up to 349. So I want a 10x8 matrix with row sums >> corresponding to "seats" and column sums corresponding to "mandates". >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > > -- > Dr. Charles Determan, PhD > Integrated Biosciences __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] incorrect number of dimensions
Look below to see what happens to your formatting when you use html. Don't use html. Why do you use x='df' in defining the function df is a data frame with 5 observations and 4 variables. 'df' is a character vector of length 1. Your function is looking for a data frame (or matrix) with at least 4 columns. --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Marie-Eve St-Onge Sent: Thursday, September 11, 2014 10:53 AM To: r-help@r-project.org Subject: [R] incorrect number of dimensions Dear all, I'm trying the following experiment simulation, but I'm receiving this error: > probs()Error in x[j, 4] : incorrect number of dimensions however, the simulation works fine outside the function statement{}. What am I doing wrong? # Create some fake data and call the function: df <- data.frame(y1 = rpois(5, 9),y2 = rpois(5, 7), y3 = rpois(5, 8), n = rpois(5, 100)) probs = function(x='df', j=5, export=1){ p=gtools::rdirichlet(10, x[j,4] * c(x[j,1],x[j,2],x[j,3], 1-x[j,1]-x[j,2]-x[j,3])/100+1 )if(export==1){ mean(p[,1] > p[,3])} else { return(p)} } Eve [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create new column by replacing multiple unique values in existing column
Note that in the data you sent, b is a factor: > str(dat1) 'data.frame': 15 obs. of 2 variables: $ a: int 1 2 3 4 5 6 7 8 9 10 ... $ b: Factor w/ 3 levels "A1","A2","B1": 1 1 1 1 1 2 2 2 2 2 ... So all you need is > dat1$new <- as.numeric(dat1$b) > table(dat1$new) > table(dat1$new) 1 2 3 5 5 5 > table(dat1$b) A1 A2 B1 5 5 5 If b is not a factor in your table, make it one ?factor - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of raz Sent: Thursday, September 11, 2014 10:49 AM To: r-help@r-project.org Subject: [R] create new column by replacing multiple unique values in existing column Hi, I got the following data frame: dat1 <- read.table(text="a,b 1,A1 2,A1 3,A1 4,A1 5,A1 6,A2 7,A2 8,A2 9,A2 10,A2 11,B1 12,B1 13,B1 14,B1 15,B1",sep=",",header=T) I would like to add a new column dat1$new based on column "b" (dat$b) in which values will be substituted according to their unique values e.g "A1" will be "1", "A2" will be "2" and so on (this is only a part of a large table). It would be better if I could change all unique values in dat1 to numbers 1:unique(n). if not then how do I change all values ("A1","A2","B1") to (1,2,3) in a new column?. Thanks a lot, Raz -- \m/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] KDE routines for data that is aggregated
If the x and y values are regularly spaced, you could use contour() or persp() to plot the densities. If they are not, you can use density(), loess(), gam(), kriging another function to estimate a smooth surface for the values and then estimate the values over a regular grid and then plot with contour, etc. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Saptarshi Guha Sent: Monday, September 8, 2014 6:57 PM To: R-help@r-project.org Subject: [R] KDE routines for data that is aggregated Hello, Couldn't think of a better subject line. Rather than a matrix like x,y ..,.. .,.. I have a matrix like x,y,n, ..,..,.., ..,..,.. and so on. Also, sum(n) is roughly few hundred million. The number of rows is <1MM Are they routines to fit a 2d kde estimate to data provided in this form? I can sample from the data according to weights given by 'n' but i am curious if there is something that can use all the data when given a structure of this form. Regards Saptarshi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame
The big difference between the data sets is that many of your rows (16) have all missing values. None of mine do. If you run my data and yours, you will see that decast throws a warning "Aggregation function missing: defaulting to length" with your data but not with mine. As a result, instead of using the value of rank, dcast uses length(rank) which is always 1 except when there are multiple missing values when it is the number of missing values. This problem will occur whenever there is more than one missing value on a row. The simplest way to handle this is to create a function that returns the first value of a vector and use that with the fun.aggregate= argument: > first <- function(x) {x[1]} > d4<- dcast(d3, row~color, fun.aggregate=first, value.var="rank", fill=0) The only drawback is that this will not warn you if a category was ranked twice except that the NA column will be zero and one of the other columns will be zero. The number of missing values is the number of zeroes in your category columns (not including row or NA) and the value in NA is the lowest rank that was missing. David C -Original Message- From: Simon Kiss [mailto:sjk...@gmail.com] Sent: Friday, September 5, 2014 10:22 AM To: David L Carlson Cc: r-help@r-project.org Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame HI, of course. The a mini-version of my data-set is below, stored in d2. Then the code I'm working follows. library(reshape2) #Create d2 structure(list(row = 1:50, rank1 = structure(c(3L, 3L, 3L, 4L, 3L, 3L, NA, NA, 3L, NA, 3L, 3L, 1L, NA, 2L, NA, 3L, NA, 2L, 1L, 1L, 3L, NA, 6L, NA, 1L, NA, 3L, 1L, NA, 1L, NA, NA, 6L, 3L, NA, 1L, 3L, 3L, 4L, 1L, NA, 3L, 3L, 3L, NA, 3L, 3L, NA, 1L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor"), rank2 = structure(c(6L, 1L, 1L, 2L, 4L, 6L, NA, NA, 6L, NA, 6L, 4L, 2L, NA, 4L, NA, 6L, NA, 1L, 6L, 3L, 2L, NA, 3L, NA, 6L, NA, 6L, 6L, NA, 3L, NA, NA, 3L, 6L, NA, 6L, 6L, 6L, 7L, 3L, NA, 1L, 6L, 6L, NA, 2L, 6L, NA, 2L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor"), rank3 = structure(c(1L, 6L, 4L, 3L, 2L, 4L, NA, NA, 4L, NA, 1L, 1L, 6L, NA, 1L, NA, 1L, NA, 7L, 3L, 6L, 1L, NA, 2L, NA, 4L, NA, 1L, 3L, NA, 6L, NA, NA, 4L, 2L, NA, 7L, 1L, 1L, 6L, 7L, NA, 6L, 1L, 1L, NA, 4L, 1L, NA, 3L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor"), rank4 = structure(c(7L, 4L, 2L, 1L, 1L, 7L, NA, NA, 1L, NA, 7L, 2L, 7L, NA, 3L, NA, 2L, NA, 3L, 4L, 5L, 6L, NA, 4L, NA, 3L, NA, 4L, 4L, NA, 4L, NA, NA, 2L, 7L, NA, 2L, 2L, 2L, 3L, 6L, NA, 2L, 5L, 4L, NA, 1L, 2L, NA, 4L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor"), rank5 = structure(c(2L, 7L, 6L, 7L, 7L, 2L, NA, NA, 2L, NA, 2L, 7L, 3L, NA, 6L, NA, 7L, NA, 6L, 7L, 4L, 7L, NA, 7L, NA, 7L, NA, 2L, 2L, NA, 2L, NA, NA, 7L, 1L, NA, 3L, 7L, 4L, 2L, 2L, NA, 4L, 2L, 2L, NA, 6L, 4L, NA, 5L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor"), rank6 = structure(c(4L, 2L, 7L, 6L, 6L, 1L, NA, NA, 7L, NA, 4L, 5L, 4L, NA, 7L, NA, 4L, NA, 4L, 2L, 2L, 4L, NA, 1L, NA, 2L, NA, 7L, 7L, NA, 7L, NA, NA, 1L, 4L, NA, 4L, 4L, 7L, 1L, 4L, NA, 7L, 7L, 7L, NA, 7L, 7L, NA, 7L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor"), rank7 = structure(c(5L, 5L, 5L, 5L, 5L, 5L, NA, NA, 5L, NA, 5L, 6L, 5L, NA, 5L, NA, 5L, NA, 5L, 5L, 7L, 5L, NA, 5L, NA, 5L, NA, 5L, 5L, NA, 5L, NA, NA, 5L, 5L, NA, 5L, NA, 5L, 5L, 5L, NA, 5L, 4L, 5L, NA, 5L, 5L, NA, 6L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor")), .Names = c("row", "rank1", "rank2", "rank3", "rank4", "rank5", "rank6", "rank7"), row.names = c(NA, 50L), class = "data.frame") #This code is a replication of David Carlson's code (below) which works splendidly, but does not work on my data-set #Melt d2: Note, I've used value.name='color' to maximi
Re: [R] calculate Euclidean distances between populations in R with this data structure
There may be a specialized package for this in bioconductor, but it seems that you could just use aggregate() to calculate the means for each population and then use the results of that in dist(). ?aggregate - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ding, Yuan Chun Sent: Thursday, September 4, 2014 3:11 PM To: r-help@R-project.org Subject: [R] calculate Euclidean distances between populations in R with this data structure I want to calculate Euclidean distance between 12 populations, in each population there are 20 samples and each sample is measured for 100 genes (these are microarray data; the numbers here are just examples). The equation I found is: distance = sqrt{[sum(Average of xi -average of yi)^2] /n }, i=1 to n; where xi and yi are the expression of gene i over two populations with p and q samples (x1, x2,...,xp), (y1, y2,...,yq), n is the number of genes. part of data are pasted below row.names pop1.1pop1.2 pop1.3 pop1.4 pop2.1 pop2.2 pop2.3 pop2.4 7A5 5.38194 4.06191 4.88044 5.60383 6.23101 6.53738 4.80336 5.86136 A1BG5.15155 4.29441 4.59131 4.90026 4.62908 4.48712 4.73039 4.46208 A1CF4.22396 4.14451 4.41465 3.93179 4.89638 4.66109 4.20918 4.48107 A26C3 12.1969 12.4179 10.9786 11.7659 11.405 11.7594 11.1757 11.8128 How might one calculate these distances in R with this data structure? Thanks, Ding - *SECURITY/CONFIDENTIALITY WARNING: This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wi! sh to receive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. (fpc5p) - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] depth of labels of axis
The problem with this approach is that the horizontal positioning of the labels is based on the width of the label including the phantom part so that the E's are pushed to the left of the tick mark (at least on my Windows machine). But it does provide a way of dealing with superscripts as long as the phantom is added to each label and hadj= is used to position the label horizontally, eg (changing the last label to a superscript for illustration): lbl <- expression(E[g]~phantom(E[g]), E~phantom(E[g]), E[j]~phantom(E[g]), E~phantom(E[g]), E^t~phantom(E[g])) plot(1:5, xaxt = "n") axis(1, at = 1:5, labels = lbl, hadj=.1) abline(h=.7, xpd=TRUE, lty=3) David C -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Thursday, September 4, 2014 2:25 PM To: Jinsong Zhao Cc: r-help@r-project.org Subject: Re: [R] depth of labels of axis On Sep 3, 2014, at 10:05 PM, Jinsong Zhao wrote: > On 2014/9/3 21:33, Jinsong Zhao wrote: >> On 2014/9/2 11:50, David L Carlson wrote: >>> The bottom of the expression is set by the lowest character (which can >>> even change for subscripted letters with descenders. The solution is >>> to get axis() to align the tops of the axis labels and move the line >>> up to reduce the space, e.g. >>> >>> plot(1:5, xaxt = "n") >>> axis(1, at = 1:5, labels = c(expression(E[g]), "E", expression(E[j]), >>> "E", expression(E[t])), padj=1, mgp=c(3, .1, 0)) >>> # Check alignment >>> abline(h=.7, xpd=TRUE, lty=3) >> >> yes. In this situation, padj = 1 is the fast solution. However, If there >> are also superscript, then it's hard to alignment all the labels. >> >> If R provide a mechanism that aligns the label in axis() or text() with >> the baseline of the character without the super- and/or sub-script, that >> will be terrific. > > it seems that the above wish is on the Graphics TODO lists: > https://www.stat.auckland.ac.nz/~paul/R/graphicstodos.html > > Allow text adjustment for mathematical annotations which is relative to a > text baseline (in addition to the current situation where adjustment is > relative to the bounding box). > In many case adding a phantom argument will correct aliognment problems: plot(1:5, xaxt = "n") axis(1, at = 1:5, labels = c(expression(E[g]), E~phantom(E[g]), expression(E[j]), E~phantom(E[g]), expression(E[t]))) abline(h=.7, xpd=TRUE, lty=3) Notice that c(expression(.), ...) will coerce all items separated by commas to expressions, sot you cna just put in "native" expression that are not surrounded by the `expression`-function c(expression(E[g]), E~phantom(E[g]), expression(E[j]) ) #returns # expression(E[g], E ~ phantom(E[g]), E[j]) The tilde is actually a function that converts parse-able strings into R language objects: c(expression(E[g]), E~phantom(E[g]), ~E[j]) -- David. >>> >>> >>> -Original Message- >>> From: r-help-boun...@r-project.org >>> [mailto:r-help-boun...@r-project.org] On Behalf Of Jinsong Zhao >>> Sent: Monday, September 1, 2014 6:41 PM >>> To: r-help@r-project.org >>> Subject: [R] depth of labels of axis >>> >>> Hi there, >>> >>> With the following code, >>> >>> plot(1:5, xaxt = "n") >>> axis(1, at = 1:5, labels = c(expression(E[g]), "E", expression(E[j]), >>> "E", expression(E[t]))) >>> >>> you may notice that the "E" within labels of axis(1) are not at the same >>> depth. So the vision of axis(1) labels is something like wave. >>> >>> Is there a possible way to typeset the labels so that they are have the >>> same depth? >>> >>> Any suggestions will be really appreciated. Thanks in advance. >>> >>> Best regards, >>> Jinsong David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame
I think we would need enough of the data you are using to figure out how to modify the process. Can you use dput() to send a small data set that fails to work? David C -Original Message- From: Simon Kiss [mailto:sjk...@gmail.com] Sent: Thursday, September 4, 2014 1:28 PM To: David L Carlson Cc: r-help@r-project.org Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame Hi David and list: This is working, except at this command mycast <- dcast(mymelt, row~color, value.var="rank", fill=0) dcast is using "length" as the default aggregating function. This results in not accurate results. It tells me, for example how many choices were missing values and it tells me if a person selected any given option (value is reported as 1). When I try to run your reproducible research, it works great, but something with the aggregating function is not working properly with mine. Any other thoughts? Simon On Aug 18, 2014, at 10:44 AM, David L Carlson wrote: > Another approach using reshape2: > >> library(reshape2) >> # Construct data/ add column of row numbers >> set.seed(42) >> mydf <- data.frame(t(replicate(100, sample(c("red", "blue", > + "green", "yellow", NA), 4 >> mydf <- data.frame(rows=1:100, mydf) >> colnames(mydf) <- c("row", "rank1", "rank2", "rank3", "rank4") >> head(mydf) > row rank1 rank2 rank3 rank4 > 1 1yellowred blue > 2 2 yellow green red > 3 3 yellow green blue > 4 4 blue yellow green > 5 5 red blue green > 6 6 red green blue >> # Reshape >> mymelt <- melt(mydf, id.vars=1, measure.vars=2:5, > + variable.name="rank", value.name="color") >> # Convert rank to numeric >> mymelt$rank <- as.numeric(mymelt$rank) >> mycast <- dcast(mymelt, row~color, value.var="rank", fill=0) >> head(mycast) > row blue green red yellow NA > 1 14 0 3 2 1 > 2 20 2 4 1 3 > 3 33 2 0 1 4 > 4 42 4 0 3 1 > 5 53 4 2 0 1 > 6 64 3 2 0 1 > > David C > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of David L Carlson > Sent: Sunday, August 17, 2014 6:32 PM > To: Simon Kiss; r-help@r-project.org > Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A > Data Frame > > There is probably an easier way to do this, but > >> set.seed(42) >> mydf <- data.frame(t(replicate(100, sample(c("red", "blue", > + "green", "yellow", NA), 4 >> colnames(mydf) <- c("rank1", "rank2", "rank3", "rank4") >> head(mydf) > rank1 rank2 rank3 rank4 > 1yellowred blue > 2 yellow green red > 3 yellow green blue > 4 blue yellow green > 5 red blue green > 6 red green blue >> lvls <- levels(mydf$rank1) >> # convert color factors to numeric >> for (i in seq_along(mydf)) mydf[,i] <- as.numeric(mydf[,i]) >> # stack the columns >> mydf2 <- stack(mydf) >> # convert rank factor to numeric >> mydf2$ind <- as.numeric(mydf2$ind) >> # add row numbers >> mydf2 <- data.frame(rows=1:100, mydf2) >> # Create table >> mytbl <- xtabs(ind~rows+values, mydf2) >> # convert to data frame >> mydf3 <- data.frame(unclass(mytbl)) >> colnames(mydf3) <- lvls >> head(mydf3) > blue green red yellow > 14 0 3 2 > 20 2 4 1 > 33 2 0 1 > 42 4 0 3 > 53 4 2 0 > 64 3 2 0 > > David C > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Simon Kiss > Sent: Friday, August 15, 2014 3:58 PM > To: r-help@r-project.org > Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A > Data Frame > > > Both the suggestions I got work very well, but what I didn't realize is that > NA values would cause serious problems. Where there is a missing value, > using the argument na.last=NA to order just returns the the order of the > factor levels, but excludes the missing values, but I have no idea where > those occur in the or rather which of those variables were actually missing. > Have I explained this problem sufficiently? > I didn't think it would cause such a problem so I didn't include it in the > or
Re: [R] wilcox.test - difference between p-values of R and online calculators
Since they all have the same W/U value, it seems likely that the difference is how the different versions adjust the standard error for ties. Here are a couple of posts addressing the issues of ties: http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9200.html http://stats.stackexchange.com/questions/6127/which-permutation-test-implementation-in-r-to-use-instead-of-t-tests-paired-and David C From: wbradleyk...@gmail.com [mailto:wbradleyk...@gmail.com] On Behalf Of W Bradley Knox Sent: Wednesday, September 3, 2014 9:20 AM To: David L Carlson Cc: Tal Galili; r-help@r-project.org Subject: Re: [R] wilcox.test - difference between p-values of R and online calculators Tal and David, thanks for your messages. I should have added that I tried all variations of true/false values for the exact and correct parameters. Running with correct=FALSE makes only a tiny change, resulting in W = 485, p-value = 0.0002481. At one point, I also thought that the discrepancy between R and these online calculators might come from how ties are handled, but the fact that R and two of the online calcultors reach the same U/W values seems to indicate that ties aren't the issue, since (I believe) the U or W values contain all of the information needed to calculate the p-value, assuming the number of samples is also known for each condition. (However, it's been a while since I looked into how MWU tests work, so maybe now's the time to refresh.) If that's correct, the discrepancy seems to be based in what R does with the W value that is identical to the U values of two of the online calculators. (I'm also assuming that U and W have the same meaning, which seems likely.) - Brad W. Bradley Knox, PhD http://bradknox.net<http://bradknox.net/> bradk...@mit.edu<mailto:bradk...@mit.edu> On Wed, Sep 3, 2014 at 9:10 AM, David L Carlson mailto:dcarl...@tamu.edu>> wrote: That does not change the results. The problem is likely to be the way ties are handled. The first sample has 25 values of which 23 are identical (359). The second sample has 26 values of which 12 are identical (359). The difference between the implementations may be a result of the way the ties are ranked. For example the R function rank() offers 5 different ways of handling the rank on tied observations. With so many ties, that could make a substantial difference. Package coin has wilxon_test() which uses Monte Carlo simulation to estimate the confidence limits. --------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On Behalf Of Tal Galili Sent: Wednesday, September 3, 2014 5:24 AM To: W Bradley Knox Cc: r-help@r-project.org<mailto:r-help@r-project.org> Subject: Re: [R] wilcox.test - difference between p-values of R and online calculators It seems your numbers has ties. What happens if you run wilcox.test with correct=FALSE, will the results be the same as the online calculators? Contact Details:--- Contact me: tal.gal...@gmail.com<mailto:tal.gal...@gmail.com> | Read me: www.talgalili.com<http://www.talgalili.com> (Hebrew) | www.biostatistics.co.il<http://www.biostatistics.co.il> (Hebrew) | www.r-statistics.com<http://www.r-statistics.com> (English) -- On Wed, Sep 3, 2014 at 3:54 AM, W Bradley Knox mailto:bradk...@mit.edu>> wrote: > Hi. > > I'm taking the long-overdue step of moving from using online calculators to > compute results for Mann-Whitney U tests to a more streamlined system > involving R. > > However, I'm finding that R computes a different result than the 3 online > calculators that I've used before (all of which approximately agree). These > calculators are here: > > http://elegans.som.vcu.edu/~leon/stats/utest.cgi > http://vassarstats.net/utest.html > http://www.socscistatistics.com/tests/mannwhitney/ > > An example calculation is > > > *wilcox.test(c(359,359,359,359,359,359,335,359,359,359,359,359,359,359,359,359,359,359,359,359,359,303,359,359,359),c(332,85,359,359,359,220,231,300,359,237,359,183,286,355,250,105,359,359,298,359,359,359,28.6,359,359,128))* > > which prints > > > > > > > > > > *Wilcoxon rank sum test with continuity correction data: c(359, 359, 359, > 359, 359, 359, 335, 359, 359, 359, 359, 359, and c(332, 85, 359, 359, 359, > 220, 231, 300, 359, 237, 359, 183, 359, 359, 359, 359, 359, 359, 359, 359, > 359, 303, 359, 359, and 286, 355, 250, 105, 359, 359, 298, 359, 359, 359,