Re: [R] [PS] Switching entries in vector in by groups of two
Perhaps xnew <- x[1:length(x)+c(1,-1)] will do it. Ben Fairbank -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Afshartous Sent: Friday, June 27, 2008 9:11 AM To: r-help@r-project.org Subject: [PS] [R] Switching entries in vector in by groups of two All, I have a long vector that contains an even number of entries. I'd like to switch the 1st and 2nd entry, the 3rd and 4th, and so on, without writing a loop. This code works: X = c(8, 10, 6, 3, 20, 1) index = c(2,1,4,3,6,5) X[index] But for a long list is there a way to generate the index? I can get the parts to the index as: index.odd = seq(1,length(X), by = 2) index.even = index.odd + 1 Is there a simple way to interweave them to produce the desired index? Or is there a better way? Cheers, David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] The Green Book and its relevance to R
I bogged down about half way through reading the Green Book, in part because it became increasingly difficult to understand how some of the ideas related to R, as opposed to S (which I have not used). Does any reader know whether there is a document that points out differences between S and R that would be helpful in reading the Green Book? Ideally, perhaps, I need a "crib sheet" to help relate "Programming with data" to R, as opposed to S. And, incidentally, in the opinion of those who have read all three, which of the books, blue, green, or white (or maybe V & R "S programming"?), would be most recommended as the next book for one who would move beyond advanced beginner status? (Programming experience in Fortran, APL, Python, small-system assembly language, but not C). Ben Fairbank San Antonio, Texas [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can plot() be used for multiple plots?
Greetings helpRs -- I would like to use plot() to plot two cumulative distribution curves so that a user of the plot can compare the distributions of the two variables. The following code draws two distributions separately, but I cannot find the instruction necessary to add a second cumulative distribution to the first one. Any suggestion would be very welcome. x1 <- sort(rnorm(1000,50,10)) x2 <- sort(rnorm(1000,40,8)) plot(x1,1:length(x1)/length(x1),type="l") plot(x2,1:length(x2)/length(x2),type="l") grid(col = "black") Ben Fairbank [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [PS] Re: vlookup in R
Quite right, there is an optional 4th argument, and the table must be sorted ascending on the first column in Excel. Thus these functions only approximately duplicate the Excel functions (improve on them IMHO). BTW, I pasted the wrong formula in my reply; though it works, simpler is ID <- 4 #for example, find value corresponding to 4 x[x[,1]==ID,2] Ben -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Winsemius Sent: Monday, March 24, 2008 10:20 AM To: [EMAIL PROTECTED] Subject: [PS] Re: [R] vlookup in R Sachin J <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > Is there are function similar to excel vlookup in R. Please let me > know. > Caveat: definition of VLOOKUP done from memory and by checking OO.o Calc function of same name. (Don't have Excel on this machine.) VLOOKUP looks up a single value in the first column of an Excel range and returns a column value (offset by a given integer) from the first matching row in that "range". The indexing functions ("extract" or "[" ) can be used: > df4 V1 V2 V3 1 4.56 1 0.1 2 8.42 1 0.2 3 0.79 3 0.3 4 5.39 3 0.4 5 0.95 4 0.5 6 7.73 5 0.6 7 7.17 6 0.7 8 3.89 7 0.8 9 0.54 10 1.0 10 9.53 9 0.9 > df4[df4$V1==0.79,2] [1] 3 vlookup <- function(val, df, row){ df[df[1] == val, row][1] } > vlookup(0.79, df4, 2) [1] 3 I thought there was an optional 4th argument to VLOOKUP that specifies the action to be taken if there is no exact match. You may need to change the equality in that function to an inequality and identify the first column value that is less than or equal to "val". If I remember correctly, Excel assumes that the first column is ordered ascending. -- David Winsemius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [PS] vlookup in R
Another way: If x is a two column matrix, as suggested by Henrique D., IDValue 1 7 0.000656733 2 6 0.201764789 3 1 0.671113391 4 10 -0.739727826 5 9 -1.111310154 6 5 -0.859455833 7 2 -1.408229877 8 8 0.993126295 9 3 -0.171906808 10 4 -0.140107677 And you are looking up the value corresponding to "ID" ID <- 4 x[(1:dim(x)[1])[x[,1]==ID],2] will also do it, and you can vary the value of the 2 in order to query the column of interest, much as you can do with vlookup in the E program. Ben -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sachin J Sent: Monday, March 24, 2008 9:25 AM To: r-help@r-project.org Subject: [PS] [R] vlookup in R Hi, Is there are function similar to excel vlookup in R. Please let me know. Thanks, Sachin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [PS] rmultinomial() function
Mary -- The dmultinomial function (try ?Multinomial, noting that it is an upper case M) has a "log" option, which, if set to TRUE, returns logarithms of probabilities, but that is for computing probabilities, not generating samples. Perhaps the "long" you referred to is a misprint for "log?" In any case, try ?Multinomial, and give rmultinom() another try. Note, however, that its output gives the _number_ of each of the sampled items produced in a single sample, not the sequence of draws. If you need the sequence, then I think that the reply from Erik Iverson tells you how best to proceed. Ben -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mary Black Sent: Tuesday, March 18, 2008 5:29 PM To: r-help@r-project.org Subject: [PS] [R] rmultinomial() function After scouring the online R resources and help pages, I still need clarification on the function rmultinomial(). I would like to create a vector, say of 100 elements, where every element in the vector can take on the value of 0, 1 or 2, and where each of those values have a specific probability. ie. the probability a given element in the vector = 0 is 0.06, 1 = 0.38, 2 = 0.56 (probabilities sum to 1). Can I use rmultinomial() function to do this? The following code does not seem to produce the result I need, but this sort of code is all I could find the R "help" pages: > rmultinomial(100,c(0.06,0.38,0.56)) [1] 3 29 68 > rmultinomial(100,c(0.06,0.38,0.56),long=TRUE) [1] 3 3 2 2 3 2 3 3 2 3 2 3 3 3 2 2 3 3 2 3 1 2 3 3 3 3 3 3 2 3 3 2 3 3 3 2 3 2 2 3 2 3 2 3 3 3 2 1 3 3 1 [52] 2 3 2 2 3 3 2 2 2 1 3 3 2 3 3 3 3 2 3 3 3 3 2 3 2 3 3 2 3 3 2 3 3 2 3 2 3 3 2 3 3 3 2 3 2 2 2 2 2 Also, I don't really understand the difference between the default long=FALSE and long=TRUE. The R "help" simply states that you use "long TRUE to choose one generator, FALSE to choose another one"; however I could not find any documentation that described what the difference between those generators is. Any clarification would be greatly appreciated! Thanks, Mary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [PS] How to manipulate data according to groups ?
Look at ?tapply, based on your description, it is what you want. Ben -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ng Stanley Sent: Thursday, March 13, 2008 9:25 AM To: r-help Subject: [PS] [R] How to manipulate data according to groups ? Hi, I have a two columns data, the first column are values, and second column are the groups. For this example, there are 3 groups 1,2,3. How can I manipulate the values in the first column according to groups, say I would like to find mean, sum, and standard deviation for the different groups ? How do I plot data according to groups ? > t <- matrix(c(rnorm(10), 1,1,2,2,1,3,3,3,3,2), ncol=2) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [PS] Re: a more elegant way to get percentages? (now R books)
Monica -- There has been a virtual population explosion of R books in recent years and we all have our favorites. You may wish to pick one oriented toward your specialty, but the absolute minimum lowest common denominator (by which I mean that it has the ground zero essential information that all users must share, not that it is minimal or incomplete) is the manual "An Introduction to R," available by download from the Cran website. Beyond that, my favorite introduction is Peter Dalgaard's "Introductory Statistics with R." He has an elegance and clarity of style, as well as a feel for what is necessary to include in an introduction, that some others lack. Others may disagree, but I find myself returning to Dalgaard again and again. Ben -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Monica Pisica Sent: Thursday, March 13, 2008 9:05 AM To: Gabor Grothendieck; [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: r-help@r-project.org Subject: [PS] Re: [R] a more elegant way to get percentages? Hi everybody, I am amazed how quick i got my answer ;-) I have to recognize that Gabor's code really puts to shame my skills in doing any programming in R. Is there any book or documentation which really explains in details all these neat tricks from {stats} like ave (i even didn't know this function existed), apply and all its friends (sapply, tapply, etc) ? To be honest it took me quite a while to come up with the "fancy" subscripting to get my persantages ;-)) thank you so much, i really appreciate your help, Monica > Date: Thu, 13 Mar 2008 09:45:05 -0400 > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: Re: [R] a more elegant way to get percentages? > CC: r-help@r-project.org > > Assuming your x is as follows: > > x <- data.frame(locat = c("a", "b", "b", "c", "c", "c", "c", "d", "d", "d"), > val = c(5, 5, 15, 5, 20, 5, 10, 5, 15, 10)) > > Try this: > > x$percent1 <- ave(x$val, x$locat, FUN = function(x) 100*x/sum(x)) > > On Thu, Mar 13, 2008 at 9:36 AM, Monica Pisica wrote: >> >> Hi, >> >> I am trying to get percentages in a more elegant way. I have a data.frame with locations and values (counts) of species at that location. Each location is repeated for each species i have values for and i would like to get percentages of each species at that location. I am not sure if i am clear in my explanations so i will paste my code below: >> >> # >> >>> x >> locat val >> 1 a 5 >> 2 b 5 >> 3 b 15 >> 4 c 5 >> 5 c 20 >> 6 c 5 >> 7 c 10 >> 8 d 5 >> 9 d 15 >> 10 d 10 >>> loc1 <- x$locat >>> n <- length(loc1) >>> locuniq1 <- unique(loc1) >>> m <- length(locuniq1) >>> counts <- seq(1:m) >>> >>> for (i in 1:m) { >> + count <- 0 >> + for (j in 1:n) { >> + if (loc1[j]==locuniq1[i]) count <- count+1 >> + counts[i] <- count >> + } >> + } >>> >>> percent1 <- rep(0,n) >>> j <- 0 >>> for (i in 1:m) { >> + >> + b <- x[(j+1):(j+counts[i]),] >> + total <- sum(b$val) >> + percent1[(j+1):(j+counts[i])] <- round(apply(as.matrix(b$val), 1, function(x) {x*100/total}),2) >> + j = j+counts[i] >> + } >>> x1 <- cbind(x, percent1) # this is the result i want >>> x1 >> locat val percent1 >> 1 a 5 100.00 >> 2 b 5 25.00 >> 3 b 15 75.00 >> 4 c 5 12.50 >> 5 c 20 50.00 >> 6 c 5 12.50 >> 7 c 10 25.00 >> 8 d 5 16.67 >> 9 d 15 50.00 >> 10 d 10 33.33 >>> >> >> >> I am wondering if there is any way to do it more efficiently, much more that the first loop which gives how many times each location is present in the data.frame is slow if you have a larger data.frame and not only 10 rows. >> >> Thanks for any input and sorry if the email is on the long side, >> >> Monica >> >> >> _ >> [[elided Hotmail spam]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> _ 08 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [PS] Generating a new matrix using rbinom and a matrix of probabilities.
I tried your code and could not get it to run on my installation of R, so I may be missing something. But if you have a matrix of probabilities (call it probs) and want to simulate random binomial draws, can you not simply create a matrix of the same size of uniform random numbers (runif()) (call it rands), then do comparisons, thus, draws <- 0 + (rands < probs) Ben Fairbank -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Economics Guy Sent: Tuesday, March 11, 2008 10:16 AM To: [EMAIL PROTECTED] Subject: [PS] [R] Generating a new matrix using rbinom and a matrix of probabilities. I am having a little trouble getting R to do something without writing a couple of very awkward loops. I have a matrix of probabilities and I want to generate a new matrix of ones and zeros where each element in the new matrix is the result of a draw from a binomial distribution where the probability of getting a 1 is the corresponding element in the matrix of probabilities. Example Code: ## First I generate the matrix of probabilities for example purposes. probMatrix <- matrix(NA,5,5){ for (i in 1:5) probVectorI <- runif(5,0,1) probMatrix[i,] <- probVectorI } # Now I want to take each element in probMatrix and use it as the probability parameter in rbinom draw and generate a new matrix. Something like this: binomialMatrix <- rbinom(1,1,probMatrix) # But that does not work. I know I can run a loop across each vector of the matrix, but this seems like an bad way to do this. ---End Code So any help would be appreciated. Thanks, EG __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [PS] discrete variable
Try table(), with the name of your vector inside the parentheses. Ben -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pete Dorothy Sent: Sunday, March 02, 2008 2:27 PM To: r-help@r-project.org Subject: [PS] [R] discrete variable Hello, I am sorry for asking such a basic question. I could not find an answer to it using google. I have a discrete variable (a vector x) taking for example the following values : 0, 3, 4, 3, 15, 5, 6, 5 Is it possible to know how many different values (modalities) it takes ? Here it takes 6 different values but the length of the vector is 8. I would like to know if there is a way to get the set of the modalities {0,3,4,15,5,6} with the number of times each one is taken {1,2,1,1,2,1} Thank you very much P.S. : is there some useful functions for discrete variables ? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [PS] Column sums from a data frame (without the headers)
as.vector(col.Sums()) Ben -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jason Horn Sent: Friday, February 29, 2008 11:03 AM To: [EMAIL PROTECTED] Subject: [PS] [R] Column sums from a data frame (without the headers) Does anyone know how to get a vector of column sum from a data frame? You can use colSums(), but this gives you a object of type "numeric" with the column labels in the first row, and the sums in the second row. I just want a vector of the sums, and I can't figure out a way to index the "numeric" object. Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Optimization when only binary variables can be manipulated?
I am trying to optimize in situations such as the following: Given 100 ability test items with such known item values as (1) difficulty, (2) correlation with criterion, (3) position in subject matter taxonomy, (4) illustrated/nonillustrated, (5) abstraction level, and (6) length, I seek to make three 20-item tests that are as nearly identical in their properties (difficulty, illustrations, taxonomy, etc) as possible, using each item only once. (The goal is to make the tests interchangeable; there are approx 2.6 e50 such sets of tests.) I have an expression for the merit of the extent to which the tests are identical, but since all of the manipulated variables are binary (i.e., each item is "in" or "out" of each of the three tests), derivative-based methods seem not to apply. I have read through the optimization chapter in MASS, but those methods appear not to cover this situation. Can any of the R optimization packages handle optimization when the manipulated variables are binary and numerous? With thanks for any suggestions, Ben Fairbank Technical Director Sinclair Customer Metrics [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] "hist" combines two lowest categories -- is there a workaround?
When preparing a series of histograms I found that hist was combining the two lowest categories or bins, 1 and 2. Specifying breaks, as illustrated below, resulted in the correct histogram: values <- sample(10,500,replace=TRUE) hist(values) hist(values,breaks = 0:10) Apparently, the number of values strictly less than 1 is shown in the first bin (and since none is less than 1, the value is 0), while the other bins appear to show the number of values less than or equal to the bin's upper bound. Is there a setting that will show the number of values less than or equal to the first bin's upper bound? And, while on the subject of hist, what commands govern the axis label line that shows the values of x? Is there an option that will cause it to show all values from lowest to highest rather than by jumps of 2 or 5? With thanks for any suggestions Version 2.5.0, Windows XP professional Ben Fairbank [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [PS] Different results in calculating SD of 2 numbers
And another problem, in addition to the points made by others, is that the formula for the SD gives a biased estimate (it underestimates it) of the population SD for small n when sampling from a normal distribution. When n is about twelve or so or more, the bias can usually be ignored (it is about 2.2%), but when you have only two numbers, the correction factor is about 1.25. The approximate correction formula, as I understand it, is (n-.75)/(n-1), so if n = 2, then it is 1.25/1, but this is not exact. The "real" formula is more complex (not difficult, but involves the gamma function) and my reference to it is not at this office, or I would give it. HTH, Ben -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ron Michael Sent: Wednesday, January 16, 2008 2:15 AM To: [EMAIL PROTECTED] Subject: [PS] [R] Different results in calculating SD of 2 numbers Hi all, Can anyone tell me why I am getting different results in calculating SD of 2 numbers ? > (1.25-0.95)/2 [1] 0.15 > sd(c(1.25, 0.95)) [1] 0.2121320 # why it is different from 0.15? Regards, Send instant messages to your online friends http://uk.messenger.yahoo.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.