[R] system command to a specific shell (bash)
I need to run a bash command, but when you call system() the default shell is sh (see my sessionInfo below). I found the shell command ( http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/base/html/shell.html) but it seems to be disappeared in current versions of R? I am running all this from R CMD BATCH with system calls to other R scripts. For a little more info, I'm generating sphinx documents (a python documentation library) through R and need to use a python virtual environment. So I need to call system('source bin/activate'), but source isn't a recognized command in the sh shell... Any help is appreciated, Justin sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] graphics grDevices utils datasets stats grid methods base other attached packages: [1] ggplot2_0.9.0 reshape2_1.2.1 plyr_1.7.1 loaded via a namespace (and not attached): [1] colorspace_1.1-1 dichromat_1.2-4digest_0.5.1 MASS_7.3-16 memoise_0.1munsell_0.3 [7] proto_0.3-9.2 RColorBrewer_1.0-5 scales_0.2.0 stringr_0.6 tools_2.15.0 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] system command to a specific shell (bash)
Thanks Jeff, but I'm running a python program that expects certain functionality that bash provides and sh doesn't... I can just stop using github checkouts and use system packages though and fix this. I'm mostly wondering where the shell command went in base R... it sounds like it completely solves this issue but doesn't exist in my R On Mon, Apr 16, 2012 at 10:58 AM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote: You could make a hash bang bash script that sources the file and then proceeds to do whatever you want. Bourne shell should have no problems invoking another shell. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Justin Haynes jto...@gmail.com wrote: I need to run a bash command, but when you call system() the default shell is sh (see my sessionInfo below). I found the shell command ( http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/base/html/shell.html ) but it seems to be disappeared in current versions of R? I am running all this from R CMD BATCH with system calls to other R scripts. For a little more info, I'm generating sphinx documents (a python documentation library) through R and need to use a python virtual environment. So I need to call system('source bin/activate'), but source isn't a recognized command in the sh shell... Any help is appreciated, Justin sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] graphics grDevices utils datasets stats grid methods base other attached packages: [1] ggplot2_0.9.0 reshape2_1.2.1 plyr_1.7.1 loaded via a namespace (and not attached): [1] colorspace_1.1-1 dichromat_1.2-4digest_0.5.1 MASS_7.3-16 memoise_0.1munsell_0.3 [7] proto_0.3-9.2 RColorBrewer_1.0-5 scales_0.2.0 stringr_0.6 tools_2.15.0 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A little exercise in R!
Since I thought this was a cool question, I posted it to StackOverflow. Vincent Zookynd's answer is amazing and really exercises the power of R. http://stackoverflow.com/questions/10150161/ordering-117-by-perfect-square-pairs/10150797#10150797 On Fri, Apr 13, 2012 at 10:06 PM, Bert Gunter gunter.ber...@gene.comwrote: ... and a moment's more consideration immediately shows it cannot be done for n = 18, since 16,17, and 18 cannot all be at an end. -- Bert On Fri, Apr 13, 2012 at 9:59 PM, Bert Gunter bgun...@gene.com wrote: Folks: IMHO this is exactly the **wrong** way t go about this. These are mathematical exercises that should employ mathematical thinking, not brute force checking of cases. Consider, for example, the 1 to 17 sequence given by Ted. Then 17 **must** be one end of the sequence and 16 the other. (Why?) Hence, starting from the 17 end, the values ** must** be 17 8 1 ... Proceeding in this way, it takes only a couple of minutes to solve. The more interesting point which I think the question was really about, is can this always be done? I haven't given this any thought, but there may be an easy proof or counterexample. If the answer to this latter is no, then perhaps even more interesting is to characterize the set of numbers where it can/cannot be done. But this is all way off topic, no? Cheers, Bert On Fri, Apr 13, 2012 at 6:26 PM, Philippe Grosjean phgrosj...@sciviews.org wrote: Hi all, I got another solution, and it would apply probably for the ugliest one :-( I made it general enough so that it works for any series from 1 to n (n not too large, please... tested up to 30). Hint for a better algorithm: inspect the object 'friends' in my code: there is a nice pattern appearing there!!! Best, Philippe ..¡})) ) ) ) ) ) ( ( ( ( (Prof. Philippe Grosjean ) ) ) ) ) ( ( ( ( (Numerical Ecology of Aquatic Systems ) ) ) ) ) Mons University, Belgium ( ( ( ( ( .. findSerie - function (n, tmax = 500) { ## Check arguments n - as.integer(n) if (length(n) != 1 || is.na(n) || n 1) stop('n' must be a single positive integer) tmax - as.integer(tmax) if (length(tmax) != 1 || is.na(tmax) || tmax 1) stop('tmax' must be a single positive integer) ## Suite of our numbers to be sorted nbrs - 1:n ## Trivial cases: only one or two numbers if (n == 1) return(1) if (n == 2) stop(The pair does not sum to a square number) ## Compute all possible pairs omat - outer(rep(1, n), nbrs) ## Which pairs sum to a square number? friends - sqrt(omat + nbrs) %% 1 .Machine$double.eps diag(friends) - FALSE # Eliminate pairs of same numbers ## Get a list of possible neighbours neigb - apply(friends, 1, function(x) nbrs[x]) ## Nbr of neighbours for each number nf - sapply(neigb, length) ## Are there numbers without neighbours? ## then, problem impossible to solve.. if (any(!nf)) stop(Impossible to solve:\n, paste(nbrs[!nf], collapse = , ), sum to square with nobody else!) ## Are there numbers that can have only one neighbour? ## Must be placed at one extreme toEnds - nbrs[nf == 1] ## I must have two of them maximum! l - length(toEnds) if (l 2) stop(Impossible to solve:\n, More than two numbers form only one pair:\n, paste(toEnds, collapse = , )) ## The other numbers can appear in the middle of the suite inMiddle - nbrs[!nbrs %in% toEnds] generateSerie - function (neigb, toEnds, inMiddle) { ## Allow to generate serie by picking candidates randomly if (length(toEnds) 1) toEnds - sample(toEnds) if (length(inMiddle) 1) inMiddle - sample(inMiddle) ## Choose a number to start with res - rep(NA, n) ## Three cases: 0, 1, or 2 numbers that must be at an extreme ## Following code works in all cases res[1] - toEnds[1] res[n] - toEnds[2] ## List of already taken numbers taken - toEnds ## Is there one number in res[1]? Otherwise, fill it now... if (is.na(res[1])) { taken - inMiddle[1] res[1] - taken } ## For each number in the middle, choose one acceptable neighbour for (ii in 2:(n-1)) { prev - res[ii - 1] allpossible - neigb[[prev]] candidate - allpossible[!(allpossible %in% taken)] if (!length(candidate)) break # We fail to construct the serie ## Take randomly one possible candidate if (length(candidate) 1) take - sample(candidate, 1) else take - candidate res[ii] - take taken - c(taken, take) } ## If we manage to go to the end, check last pair... if (length(taken) == (n - 1)) { take - nbrs[!(nbrs %in% taken)] res[n] - take taken
Re: [R] A little exercise in R!
I thought this was kinda cool! Here's my solution, its not robust or probably efficient I'd to hear improvements or other solutions! Justin sq.test - function(a, b) { ## test for number pairs that sum to squares. sqrt(sum(a, b)) == floor(sqrt(sum(a, b))) } ok.pairs - function(n, vec) { ## given n as a member of vec, ## which other members of vec satisfiy sq.test vec - vec[vec!=n] vec[sapply(vec, sq.test, b=n)] } grow.seq - function(y) { ## given a starting point (y) and a pairs list (pl) ## grow the squaring sequence. ly - length(y) if(ly == y[1]) return(y) ## this line is the one that breaks down on other number sets... y - c(y, max(pl[[y[ly]]][!pl[[y[ly]]] %in% y])) y - grow.seq(y) return(y) } ## start vector x - 1:17 ## get list of possible pairs pl - lapply(x, ok.pairs, vec=x) ## pick start at max since few combinations there. y - max(x) grow.seq(y) On Fri, Apr 13, 2012 at 2:34 PM, Ted Harding ted.hard...@wlandres.netwrote: Greetings all! A recent news item got me thinking that a problem stated therein could provide a teasing little exercise in R programming. http://www.bbc.co.uk/news/uk-england-cambridgeshire-17680326 Cambridge University hosts first European 'maths Olympiad' for girls The first European girls-only mathematical Olympiad competition is being hosted by Cambridge University. [...] Olympiad co-director, Dr Ceri Fiddes, said competition questions encouraged clever thinking rather than regurgitating a taught syllabus. [...] A lot of Olympiad questions in the competition are about proving things, Dr Fiddes said. If you have a puzzle, it's not good enough to give one answer. You have to prove that it's the only possible answer. [...] In the Olympiad it's about starting with a problem that anybody could understand, then coming up with that clever idea that enables you to solve it, she said. For example, take the numbers one up to 17. Can you write them out in a line so that every pair of numbers that are next to each other, adds up to give a square number? Well, that's the challenge: Write (from scratch) an R program that solves this problem. And make it neat. NOTE: If there should happen to be some R package that can solve this kind of problem already, without you having to think much, then its use is illegitimate! (I.e. will be deemed regurgitation). Over to you. With best wishes, Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 13-Apr-2012 Time: 22:33:43 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove carriage return in writing tab-delimited file.
take a look at ?paste paste(yourmatrix, sep='\t', collapse='') On Wed, Apr 4, 2012 at 2:58 PM, kickout plant.breeding.cr...@gmail.com wrote: Having problems with the write.table function. I can write a tab delimited file just fine, but for each line in my matrix its inputs a carriage return when i dont want it to. For example my matrix might be: ID V1 V2 V3 FARY1004 1 2 3 FARY2067 2 3 1 FARY4587 2 2 2 And I want the written File to be: FARY1004 1 2 3FARY2067 2 3 1FARY4587 2 2 2 TIA -- View this message in context: http://r.789695.n4.nabble.com/Remove-carriage-return-in-writing-tab-delimited-file-tp4533322p4533322.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling rows from a list
## recreating your data mydata-list(matrix(1:9, nrow=3, byrow=T), matrix(10:15, nrow=2, byrow=T), matrix(16:30, nrow=5, byrow=T)) ## get the shortest matrix in your list n - min(unlist(lapply(mydata, nrow))) ## subset the list into random samples of length n out - lapply(mydata, function(x, n) x[sample(1:nrow(x), n),], n=n) ## this structure is still a list though... ## converting directly to an array: out.array - array(unlist(out), dim=c(dim(out[[1]]), length(out))) not totally sure about what structure you're wanting in the last step, so if i missed i apologize... Hope that helps, Justin On Mon, Apr 2, 2012 at 11:24 AM, Bcampbell99 briand.campb...@ec.gc.ca wrote: Hi: I'm sure this seems like a rudimentary question, but I am not well versed with R syntax for lists. I have a ragged array from which I've removed records (entire rows) with missing data. The functions I used to remove the missing cases resulted in the generation of an R list class object, that looks something like this; mydata [[1]] [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 [[2]] [,1] [,2] [,3] [1,] 10 11 12 [2,] 13 14 15 [[3]] [,1] [,2] [,3] [1,] 16 17 18 [2,] 19 20 21 [3,] 22 23 24 [4,] 25 26 27 [5,] 28 29 30 Part1 What I would like to do is draw an equal number of random row samples from[[1]],[[2]] and [[3]] (to preserve the structure of [,1][,2],[,3]. Part2 Then I would like to cocerce the list object into something like an array. Help scripting out part 1 or 2 would be much appreciated. Brian Campbell -- View this message in context: http://r.789695.n4.nabble.com/sampling-rows-from-a-list-tp4526831p4526831.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list assignment syntax?
You can also take a look at http://stackoverflow.com/questions/7519790/assign-multiple-new-variables-in-a-single-line-in-r which has some additional solutions. On Fri, Mar 30, 2012 at 4:49 PM, Peter Ehlers ehl...@ucalgary.ca wrote: On 2012-03-30 15:40, ivo welch wrote: Dear R wizards: is there a clean way to assign to elements in a list? what I would like to do, in pseudo R+perl notation is f- function(a,b) list(a+b,a-b) (c,d)- f(1,2) and have c be assigned 1+2 and d be assigned 1-2. right now, I use the clunky x- f(1,2 c- x[[1]] d- x[[2]] rm(x) which seems awful. is there a nicer syntax? regards, /iaw Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com) I must be missing something. Why not just assign to a vector instead of a list? f- function(a,b) c(a+b,a-b) If it's imperative that f return a list, then you could use (c, d) - unlist(f(a, b)) to get vector (c, d). Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scanning data into r
What have you tried? What type of file are you trying to import from? What do you want your data to look like in R? take a look at ?read.table and ?readLines On Wed, Mar 28, 2012 at 11:23 AM, joel.green joel.gr...@live.co.uk wrote: Hey I am having trouble importing data into R, my data field looks like this 21 TEST DATA 32 year:2012 33 34 5 36 I require the the number at the start of each line however the text is not needed, i am struggling to get R to import the data with out changing the file itself? how do i import the data, i have tried using comment.char= , however this didnt work, any help would be much appreciated thanks -- View this message in context: http://r.789695.n4.nabble.com/scanning-data-into-r-tp4513182p4513182.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why does this work? plyr within-subset normalization
To those without access to nabble, the code in reference is: relative - ddply(ranktable, .(Timestamp), function(x) data.frame(relative = x[,5]/max(x[,5]))) I may be misunderstanding your question, but: ddply splits your data.frame, ranktable, by the column Timestamp into many smaller data.frames, one for each unique Timestamp value. Those new small data.frames are sent one at a time to the function you specify. So, when you call max(x[,5]) you're taking the max of the data.frame sent to the function rather than the max of the larger ranktable data.frame. On Wed, Mar 28, 2012 at 10:18 AM, z2.0 zack.abraham...@gmail.com wrote: Working code that normalize each row's value against the subset's maximum. Does the invocation of max() somehow instruct R to 'step back' and evaluate the subset? Thanks, Zack -- View this message in context: http://r.789695.n4.nabble.com/Why-does-this-work-plyr-within-subset-normalization-tp4512989p4512989.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to match exact phrase using gsub (or similar function)
In most regexs the carrot( ^ ) signifies the start of a line and the dollar sign ( $ ) signifies the end. gsub('^S S', 'S', a) gsub('^S S', 'S', '3421 BIGS St') you can use logical or inside your pattern too: gsub('^S S|S S$| S S ', 'S', a) the S S condition is difficult. gsub('^S S|S S$| S S ', 'S', 'foo S S bar') gives the wrong output. as does: gsub('^S S | S S$| S S ', ' S ', 'foo S S bar') gsub('^S S | S S$| S S ', ' S ', a) so you might have to catch that with a second gsub. gsub(' S S ', ' S ', 'foo S S bar') On Wed, Mar 28, 2012 at 12:32 PM, Markus Weisner r...@themarkus.com wrote: trying to switch out addresses that have double directions, such as the following example: a = S S Main St Interstate 95 a = gsub(pattern=S S , replacement=S , a) … the problem is that I don't want to affect instances where this might be a correct address such as the following: 3421 BIGS St what I want to say is switch out only if this is either of the following situations [beginning of char]S S S S S S[end of char] Is there anyway of making gsub or a similar function make the replacements I want? Thanks in advance for your help. ~Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to match exact phrase using gsub (or similar function)
wow! and here I thought I was starting to know most things about regexes... On Wed, Mar 28, 2012 at 1:34 PM, William Dunlap wdun...@tibco.com wrote: You can use the \ and \ patterns (backslashing the backslashes) to mean start and end of word, respectively. E.g., addresses - c(S S Main St Interstate 95, 3421 BIGS St) gsub(\\S S\\, S, addresses) [1] S Main St Interstate 95 3421 BIGS St Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Justin Haynes Sent: Wednesday, March 28, 2012 1:24 PM To: Markus Weisner Cc: r-help@r-project.org Subject: Re: [R] how to match exact phrase using gsub (or similar function) In most regexs the carrot( ^ ) signifies the start of a line and the dollar sign ( $ ) signifies the end. gsub('^S S', 'S', a) gsub('^S S', 'S', '3421 BIGS St') you can use logical or inside your pattern too: gsub('^S S|S S$| S S ', 'S', a) the S S condition is difficult. gsub('^S S|S S$| S S ', 'S', 'foo S S bar') gives the wrong output. as does: gsub('^S S | S S$| S S ', ' S ', 'foo S S bar') gsub('^S S | S S$| S S ', ' S ', a) so you might have to catch that with a second gsub. gsub(' S S ', ' S ', 'foo S S bar') On Wed, Mar 28, 2012 at 12:32 PM, Markus Weisner r...@themarkus.com wrote: trying to switch out addresses that have double directions, such as the following example: a = S S Main St Interstate 95 a = gsub(pattern=S S , replacement=S , a) . the problem is that I don't want to affect instances where this might be a correct address such as the following: 3421 BIGS St what I want to say is switch out only if this is either of the following situations [beginning of char]S S S S S S[end of char] Is there anyway of making gsub or a similar function make the replacements I want? Thanks in advance for your help. ~Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert day of year back into a date format.
There may very well be a better solution, but this works. format(strptime(dayofyear, format=%j), format=%m-%d) On Tue, Mar 27, 2012 at 11:12 AM, Sam Albers tonightstheni...@gmail.comwrote: Hello, I am having trouble figuring out how to convert a Day of Year integer back into a Date format. For example I have the following: date - c('2008-01-01','2008-01-02','2008-01-03','2008-01-04','2008-01-05','2008-01-06','2008-01-07', '2008-01-08','2008-01-09','2008-01-10','2008-01-11','2008-01-12','2008-01-13','2008-01-14','2008-01-15', '2008-01-16','2008-01-17','2008-01-18','2008-01-19','2008-01-20','2008-01-21','2008-01-22','2008-01-23') ## this is then converted into a number corresponding to the day of the year like so: dayofyear - strptime(date, format=%Y-%m-%d)$yday + 1 ## Now my question is how do I get back to a date format (obviously omitting the year). ## The end result is that I'd like to be able to have axis labels as something like Month-Day or just Month ## instead of just an integers which isn't always intuitive for people but I can't seem to figure out how to tell R ## to recognize an integer as a date. Any suggestions? Many thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove a word from a character vector value XXXX
Hadley's package stringr is wonderful for all things string. library(stringr) ?str_trim and ?str_replace are what you want. (the base R equivalent of these two would be ?gsub and some regular expressions) str_trim(str_replace(d5.Region, 'Average', '')) should do the trick. hope that helps, Justin On Wed, Mar 7, 2012 at 8:03 AM, Dan Abner dan.abne...@gmail.com wrote: Hi everyone, What is the easiest way to remove the word Average and strip leading and trailing blanks from the character vector (d5.Region) below? .nrow.d5. d5.Region 1 1 Central Average 2 2 Coastal Average 3 3 East Average 4 4 Metro East Average 5 5 Metro North Average 6 6 Metro South Average 7 7 Metro West Average 8 8 Northeast Average 9 9 Northwest Average Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logical to vector?
?as.numeric as.numeric(c(TRUE, FALSE)) [1] 1 0 On Wed, Mar 7, 2012 at 8:02 AM, Ed Siefker ebs15...@gmail.com wrote: I am trying to use the coXpress function from the coXpress package. This function requires numerical vectors indicating which columns are in which group. The problem is, I can only figure out how to get a logical structure, not a numerical one. In other words, coXpress wants something like: 1:3 I have something like: TRUE TRUE TRUE FALSE FALSE Can I convert one into the other easily? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GPS handling libraries or (String manipulation)
Take a look at: http://cran.r-project.org/web/views/Spatial.html But I've always just parsed the string... This is from the last time I did this, its not quite the same but you can see the similarities. ## if data is presented as 43°02'46.60059 N need to split on the ° symbol, ' and . to.decimal - function(vec){ # convert all symbols to _ vec - gsub('°','_',vec) vec - gsub('\'','_',vec) vec - gsub('\','_',vec) split - str_split(vec,'_') deg - as.numeric(sapply(split,'[',1)) min - as.numeric(sapply(split,'[',2)) sec - as.numeric(sapply(split,'[',3)) deg - deg + min/60 + sec/3600 return(deg) } On Wed, Mar 7, 2012 at 8:28 AM, Alaios ala...@yahoo.com wrote: Dear all, I would like to ask you if R has a library that can work with different GPS formats For example I have a string of this format N50° 47.513 E006° 03.985 and I would like to convert to GPS decimal format. that means for example converting the part N50° 47.513 to 50 + 47/60 + 513/3600. Is it possible to do that with R? What is the name of such a library? I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GPS handling libraries or (String manipulation)
Wow... that is WAY better! Thanks Gabor! On Wed, Mar 7, 2012 at 8:51 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Wed, Mar 7, 2012 at 11:28 AM, Alaios ala...@yahoo.com wrote: Dear all, I would like to ask you if R has a library that can work with different GPS formats For example I have a string of this format N50° 47.513 E006° 03.985 and I would like to convert to GPS decimal format. that means for example converting the part N50° 47.513 to 50 + 47/60 + 513/3600. Is it possible to do that with R? What is the name of such a library? Use strapply to extract the digits and convert them to numeric followed by matrix multiplication to apply the formula: library(gsubfn) x - N50° 47.513 c(1, 1/60, 1/3600) %*% strapply(x, \\d+, as.numeric, simplify = TRUE) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expression
gsub('.+; (.+);.+','\\1',x) or if you just want the value out: gsub('.+; Surv\\(months\\): ([0-9]+);.+','\\1',x) You can also look at strsplit: strsplit(x,';') [[1]] [1] 99-625: Cell type: S Surv(months): 21 STATUS(0=alive, 1=dead): 1 lapply(strsplit(x,';'),'[',2) [[1]] [1] Surv(months): 21 But i would follow David's second suggestion and just read them in with sep=';' instead. Justin On Wed, Feb 29, 2012 at 11:24 AM, Fred G bayespoker...@gmail.com wrote: Computer Friends, with the following example lines: [107] 98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1 [108] 99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1 i want to be able to isolate the number of months of survival for each row. is there a regular expression that can find the first instance of a ;, delete everything in front of it-- and find the second instance of an ; and delete everything behind it? in python there is a function line.find(), would be grateful to hear the R equiv; or, any other better alternatives to get the number of months of survival stored as a variable. Much Thank You! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem building up ggplot graph in a loop.
ggplot is looking for thisData as a column of coffs. the most 'ggplotesque' way of doing this would be: # melt your data to a long format: coffs.melt - melt(coffs, id.vars = 'levels') # plot using colour aes parameter: ggplot(coffs.melt, aes(x=levels, y=value, colour=variable)) + geom_line() + ylab('Total Chargeoffs') this is untested since there is no sample data! Justin On Thu, Feb 16, 2012 at 2:50 PM, Keith Weintraub kw1...@gmail.com wrote: Folks, I want to automate some graphing using ggplot. Here is my code graphChargeOffs2-function(coffs) { ggplot(coffs, aes(levels)) dataNames-names(coffs)[!names(coffs) == levels] for(i in dataNames) { thisData-coffs[[i]] last_plot() + geom_line(aes(y = thisData, colour = i)) } last_plot() + ylab(Total Chargeoffs) } coffs is a data.frame. I get the following error: Error in eval(expr, envir, enclos) : object 'thisData' not found As little as I know about environments in R I am pretty sure that the geom_line in the loop is not able to see the thisData variable. Any help you could provide would be appreciated. I would be surprised if there wasn't a way to pass the data into the geom_line function without using environments. Of course I have been wrong once or twice in the past. :) Note that geom_line also can't see the input variable coffs. Thanks for any and all heo -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change dataframe-structure
There is probably a more ellegant way, but: df - data.frame(p1=c(1,2,1),p2=c(3,3,2),p3=c(2,1,3),p4=c(5,6,4),p5=c(4,4,6),p6=c(6,5,5)) as.data.frame(t(apply(df,1,function(x) names(x)[match(1:6,x)]))) V1 V2 V3 V4 V5 V6 1 p1 p3 p2 p5 p4 p6 2 p3 p1 p2 p5 p6 p4 3 p1 p2 p3 p4 p6 p5 On Mon, Feb 13, 2012 at 2:07 PM, David Studer stude...@gmail.com wrote: Hello everybody, I have the following problem and have no idea how to solve it: In my dataframe I have six columns representing six societal problems (p1, p2, ..., p6). The values are ranks between 1 (worst problem) and 6 (best problem) p1 p2 p3 p4 p5 p6 1 3 2 5 4 6 2 3 1 6 4 5 1 2 3 4 6 5 but I'd like the dataframe the other way round: 123456 p1 p3 p2 p4 p4 p6 p3 p1 p2 p5 p6 p4 p1 p2 p3 p4 p6 p5 Can anyone help? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] debug in a loop
You can add if(is.na(tab[i])) browser() or if(is.na(tab[i])) break see inline On Fri, Feb 10, 2012 at 7:22 AM, ikuzar raz...@hotmail.fr wrote: Hi, I'd like to debug in a loop (using debug() and browser() etc but not print() ). I'am looking for the first occurence of NA. For instance: tab = c(1:300) tab[250] = NA len = length(tab) for (i in 1:len){ if(i != len){ if(is.na(tab[i])) browser() tab[i] = tab[i]+tab[i+1] } } I do not want to do Browse[2] n for each step ... I'd like to declare a browser() in the loop with a condition. But how to write stop running when you encounter NA ? Thanks for your help -- View this message in context: http://r.789695.n4.nabble.com/debug-in-a-loop-tp4376563p4376563.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory allocation problem (again!)
32 bit windows has a memory limit of 2GB. Upgrading to a computer thats less than 10 years old is the best path. But short of that, if you're just generating random data, why not do it in two or more pieces and combine them later? mat.1 - matrix(rnorm(5*2000),nrow=5) mat.2 - matrix(rnorm(5*2000),nrow=5) mat.3 - matrix(rnorm(5*2000),nrow=5) mat.1.sums - rowSums(mat.1) mat.2.sums - rowSums(mat.2) mat.3.sums - rowSums(mat.3) mat.sums - c(mat.1.sums,mat.2.sums,mat.3.sums) On Wed, Feb 8, 2012 at 8:37 AM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Dear all, I know this problem was discussed many times in forum, however unfortunately I could not find any way out for my own problem. Here I am having Memory allocation problem while generating a lot of random number. Here is my description: rnorm(5*6000) Error: cannot allocate vector of size 2.2 Gb In addition: Warning messages: 1: In rnorm(5 * 6000) : Reached total allocation of 1535Mb: see help(memory.size) 2: In rnorm(5 * 6000) : Reached total allocation of 1535Mb: see help(memory.size) 3: In rnorm(5 * 6000) : Reached total allocation of 1535Mb: see help(memory.size) 4: In rnorm(5 * 6000) : Reached total allocation of 1535Mb: see help(memory.size) memory.size(TRUE) [1] 15.75 rnorm(5*6000) Error: cannot allocate vector of size 2.2 Gb In addition: Warning messages: 1: In rnorm(5 * 6000) : Reached total allocation of 1535Mb: see help(memory.size) 2: In rnorm(5 * 6000) : Reached total allocation of 1535Mb: see help(memory.size) 3: In rnorm(5 * 6000) : Reached total allocation of 1535Mb: see help(memory.size) 4: In rnorm(5 * 6000) : Reached total allocation of 1535Mb: see help(memory.size) And the Session info is here: sessionInfo() R version 2.14.0 (2011-10-31) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] graphics grDevices utils datasets grid stats methods base other attached packages: [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.6 zoo_1.7-6 loaded via a namespace (and not attached): [1] lattice_0.20-0 I am using Windows 7 (home version) with 4 GB of RAM (2.16GB is usable as my computer reports). So in my case, is it not possible to generate a random vector with such length? Note that generating such vector is my primary job. Later I need to do something on that vector. Those Job includes: 1. Create a matrix with 50,000 rows. 2. Get the row sum 3. then report some metrics on that sum values (min. 50,000 elements must be there). Can somebody help me with some real solution/suggesting? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help need
Instead of a for loop, why not use the vectorization inherent in R? sigmasqaured - 1 i - complex(real = 0, imaginary =1) f - seq(0,0.5,0.1) spectrum - (sigmasqaured)/(abs(1-2.7607*exp(2*pi*i*f)+3.8106*exp(4*pi*i*f)-2.6535*exp(6*pi*i*f)+0.9258*exp(8*pi*i*f))^2) spectrum [1] 9.632720e+00 1.411130e+03 2.947753e+00 6.479994e-02 1.295175e-02 8.042731e-03 On Tue, Feb 7, 2012 at 1:08 PM, Jaymin Shah jayminsh...@live.com wrote: I have mad a for loop to try and output values which i have named spectrum. However, I cannot seem to get the answers to come out as a vector which is what i need. They come out as separate values which I am then unable to join together. Thank you for(f in seq(0,0.5,0.1)) { sigmasqaured - 1 i = complex(real = 0, imaginary = 1) spectrum - (sigmasqaured)/(abs(1-2.7607*exp(2*pi*i*f)+3.8106*exp(4*pi*i*f)-2.6535*exp(6*pi*i*f)+0.9258*exp(8*pi*i*f))^2) print(spectrum) } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I bet apply has a solution
How bout: apply(Data..,1, function(vec) !all(vec==vec[1])) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE On Mon, Feb 6, 2012 at 10:34 AM, LCOG1 jr...@lcog.org wrote: Hi all For the data below, I would like to return a logical value indicating differences in the data. #Create data Data..-data.frame(a=rep(1,10),b=c(rep(1,9),2),c=c(rep(1,8),2,2)) a b c 1 1 1 1 2 1 1 1 3 1 1 1 4 1 1 1 5 1 1 1 6 1 1 1 7 1 1 1 8 1 1 1 9 1 1 2 10 1 2 2 So what I want is to return logical value telling me if all the values are the same. So the result would be a b c DidChange 1 1 1 1 FALSE 2 1 1 1 FALSE 3 1 1 1 FALSE 4 1 1 1 FALSE 5 1 1 1 FALSE 6 1 1 1 FALSE 7 1 1 1 FALSE 8 1 1 1 FALSE 9 1 1 2 TRUE 10 1 2 2 TRUE I bet apply could handle this elegantly but that family of functions is still not 100% intuitive to me. Thoughts. Thanks everyone Cheers, Josh -- View this message in context: http://r.789695.n4.nabble.com/I-bet-apply-has-a-solution-tp4362294p4362294.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Select elements from text
how bout using read.table(... , sep= ). That would give you a vector of single words. then grepl(\\[[9-z]+\\],x) will return a boolean vector x-c('test','[bracket]','hi]','[blah','foo','[bar]') grepl('\\[[9-z]+\\]',x) [1] FALSE TRUE FALSE FALSE FALSE TRUE x[grepl('\\[[9-z]+\\]',x)] [1] [bracket] [bar] You might need a more complex reg-ex to catch them all incase of ([citation]) instances for example. Justin On Tue, Jan 24, 2012 at 6:52 AM, mdvaan mathijsdev...@gmail.com wrote: Hi, I have a series of MS word files and each file contains plain text. From these texts I would like to extract only those elements (read: words) that are between square brackets. Example of a text: Most fundamentally, it has led to an effort to clarify the organizational form concept. According to them [see also Smith, Jones and Carroll 2002], categories emerge as audience members recognize dissimilarities among groups of consumers and label them as members of a common set [Nicol 2000]. Now I would like to get the following selection: see also Smith, Jones and Carroll 2002 Nicol 2000 Any ideas on how to do this? What would be the best way to import the text in R? The entire text as an element in a dataframe? Thank you very much! Best, Mathijs -- View this message in context: http://r.789695.n4.nabble.com/Select-elements-from-text-tp4323947p4323947.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drop columns whose rows are all 0
dataset-data.frame(a=1:10,b=c(0,0,0,1,0,0,0,0,1,0),c=rep(0,10)) apply(dataset,2,function(x) all(x==0)) a b c FALSE FALSE TRUE dataset[,!apply(dataset,2,function(x) all(x==0))] a b 1 1 0 2 2 0 3 3 0 4 4 1 5 5 0 6 6 0 7 7 0 8 8 0 9 9 1 10 10 0 On Tue, Jan 24, 2012 at 8:14 AM, Francisco franciscororol...@google.comwrote: Hello, I have a dataset with 40 variables, some of them are always 0 (each row). I would like to make a subset containing only the columns which values are not all 0, but I don't know how to do it. I tried: for(cut_column in 1:40) { if(sum(dataset[,cut_column])!=**0) { columns_useful-c(columns_** useful,dataset[cut_column]) } } sorted_dataset-subset(**dataset, select=columns_useful) But it doesn't work. Thank you Francisco __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I access information stored after I run a command in R?
?str tells you about the object. str(MAX3(a,'asy',1)) from that you can see the names of the various parts including p.value. foo - MAX3(a,'asy',1)$p.value On Mon, Jan 23, 2012 at 9:32 AM, Tiago V. Pereira tiago.pere...@mbe.bio.brwrote: Dear all, Supposed I run the following command: ### #install.packages(Rassoc, dependencies=TRUE) library(Rassoc) ca=c(139,249,112) co=c(136,244,120) a=rbind(ca,co) MAX3(a,asy,1) ## I get: The MAX3 test using the asy method data: a statistic = 0.5993, p-value = 0.7933 How can one save the result 0.7933 into a file? say: foo - 0.7933 write.table(foo, file =/home/foo.txt, sep = , row.names=FALSE,col.names=TRUE, quote=FALSE, qmethod = double) However, instead of typing the value above, I would like to replace it by the macro (scalar, local) that has the accurate p-value. thanks in advance for your help. Tiago __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] colored outliers
TOC_NI-read.csv2(C:/Users/hilliges/Desktop/Master/Daten/Statistik/TOC-NI.csv, sep=;, dec=,, encoding=UTF-8) circ-TOC_NI[order(TOC_NI$NI,decreasing=T),][1:4,] plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450)) abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3) points(NI~TOC,data=TOC_NI,col='red',pch=1,size=3) ## this line is coloring all points because you're using TOC_NI still points(NI~TOC,data=circ,col='red',pch=1,size=3) ## now we're only plotting the four points in circ. sorry for the confusion. however, in the future please provide a reproducible data set along with your question so we can more easily help. Justin On Fri, Jan 20, 2012 at 5:49 AM, Geophagus falk.hilli...@twain-systems.comwrote: Dear Petr and Justin, my problem ist, that I only want to have the 4 highest values for Ni as a red point or with a red circle. The other points should not be modificated. In your proposals always all points get a red circle or a red point not only the 4 highest Ni values! I hope you could understand me! Thanks for your help! GeO -- View this message in context: http://r.789695.n4.nabble.com/colored-outliers-tp4282207p4313278.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stacked barchart in ggplot (or other library)
to use ggplot: dat-data.frame(num=1:3,usage=c(4,2,5),cap=c(10,20,10),diff=c(6,18,5)) dat.melt-melt(dat,id.var=c('num','cap')) ggplot(dat.melt)+geom_bar(aes(x=num,y=value,fill=variable),stat='identity') On Fri, Jan 20, 2012 at 12:30 PM, Jean V Adams jvad...@usgs.gov wrote: Bart6114 wrote on 01/20/2012 08:54:39 AM: Hey, I want to create a stacked barchart in R for the following dataset (http://pastebin.com/pyHUNgr2): # usage capacity diff 1 4 10 6 2 2 20 18 3 5 10 5 The stacked barchart should, in one plot show each line of the dataset as a stacked bar using data from 'usage' and 'diff' to create the stacked bar. I can't find a good example of how to do this on the ggplot2 site. Thanks in advance! See the help on barplot: ?barplot For example: df - data.frame(usage=c(4, 2, 5), capacity=c(10, 20, 10), diff=c(6, 18, 5)) barplot(t(as.matrix(df[, 1:2]))) Jean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Establishing groups using something other than ifelse()
how bout levels(df$z)[grep('A',levels(df$z))] - 'A' levels(df$z)[grep('B',levels(df$z))] - 'B' levels(df$z)[grep('C',levels(df$z))] - 'C' does that do what you're wanting? On Thu, Jan 19, 2012 at 3:05 PM, Sam Albers tonightstheni...@gmail.comwrote: Hello all, This is one of those Is there a better way to do this questions. Say I have a dataframe (df) with a grouping variable (z). This is my base data. Now I know that there is a higher order level of grouping that exist for my group variable. So what I want to do is create a new column that express that higher order level of grouping based on values in the sub-group (z in this case). In the past I have used ifelse() but this tends to get fairly redundant and messy with a large amount of sub-groupings (z). I've created a sample dataset below. Can anyone recommend a better way of achieving what I am currently achieving with ifelse()? A long series of ifelse statements makes me think that there is something better for this. ## Dataframe creation df - data.frame(x=runif(36, 0, 120), y=runif(36, 0, 120), z=factor(c(A1,A1,A2,A2,B1,B1,B2,B2,C1,C,C2,C2)) ) ## Current method is grouping df$Big.Group - with(df, ifelse(df$z==A1,A, ifelse(df$z==A2,A, ifelse(df$z==B1, B, ifelse(df$z==B2, B, C) So any suggestions? Thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] png output on a server?
I've got R running on a gentoo server that doesn't have X11 installed. Its a custom build to keep those dependencies at bay! However, some of my scripts use the base png() function and ggplot2. But, png uses X11. A google search suggests using the Cairo package, which works... but changes the fonts (specifically the size of the font). Adjusting the pointsize doesn't seem to have much effect. Aside from tuning the CairoPNG function to make my graphs look right, has anyone found a good way to avoid the X11 dependency but still use the base png function? If anyone has experience with CairoPNG and making it look like the base png function, id love to hear what you've learned! Thanks, Justin capabilities() jpeg png tifftcltk X11 aqua http/ftp sockets libxml fifo clediticonv NLS profmem FALSEFALSEFALSEFALSEFALSEFALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUEFALSE cairo FALSE sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets grid methods base other attached packages: [1] Cairo_1.5-1 ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.7.1 loaded via a namespace (and not attached): [1] tools_2.14.1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Points inside a polygon
On Wed 11 Jan 2012 08:28:03 PM PST, Hasan Diwan wrote: I have a list of bounds for a series of polygons. I do understand the formula to determine whether point i is within polygon X (X[x1] i[x] X[x2] i[x] X[y1] i[y] X[y2] i[y]), and I can apply this throughout the dataset. However, this naive algorithm doesn't scale very well. The data set contains 10,000 points consisting of (n,e) pairs where I'm interested in which are inside polygons denoted by vertices (V[x1]/V[y1],V[x2],V[y2]). Is there a shortcut to accomplish this goal? Many thanks! -- H Check out the splancs package. particularly the inout function. Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] relative frequency plot using ggplot or other function
On Thu 12 Jan 2012 09:02:27 AM PST, Mary Kindall wrote: Hi I have a data frame in the following form. There are two groups and for each 'width' relative frequency for group1 and group2 is given. How to plot this in R using ggplot or other package. Width relativeFrequency1 relativeFrequency2 1 100 0.0006388783 0.02265428 2 200 0.0022677303 0.02948625 3 300 0.0061182673 0.01739936 4 400 0.0152237225 0.02569902 5 500 0.0300215262 0.03639880 6 600 0.0597610250 0.07717765 Thanks not sure exactly what you're looking for but... dat-data.frame(width=1:6*100,rel1=runif(6), rel2=runif(6)) dat.melt-melt(dat,id.var='width') ggplot(dat.melt,aes(x=factor(width),y=value,fill=variable))+geom_bar(stat='identity',position='dodge') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] relative frequency plot using ggplot or other function
ggplot(dat.melt,aes(x=width,y=value,fill=variable,colour=variable))+geom_density(stat='identity',alpha=0.5) the fill and colour variables can be removed if you want. or ggplot(dat.melt,aes(x=width,y=value,fill=variable))+geom_density(stat='identity',alpha=0.5)+facet_wrap(~variable,ncol=1) same with this version. On Thu, Jan 12, 2012 at 9:35 AM, Mary Kindall mary.kind...@gmail.comwrote: Hi this is exactly what i am looking for but I do not like to draw as histogram instead I want two separate plot for this data. Something like the ones shown in the following link. Please disregard the legends of the following fig. http://had.co.nz/ggplot2/graphics/55078149a733dd1a0b42a57faf847036.png http://had.co.nz/ggplot2/graphics/90983232ced45a93d9fbbe40afffd69a.png Thanks On Thu, Jan 12, 2012 at 12:13 PM, Justin Haynes jto...@gmail.com wrote: On Thu 12 Jan 2012 09:02:27 AM PST, Mary Kindall wrote: Hi I have a data frame in the following form. There are two groups and for each 'width' relative frequency for group1 and group2 is given. How to plot this in R using ggplot or other package. Width relativeFrequency1 relativeFrequency2 1 100 0.0006388783 0.02265428 2 200 0.0022677303 0.02948625 3 300 0.0061182673 0.01739936 4 400 0.0152237225 0.02569902 5 500 0.0300215262 0.03639880 6 600 0.0597610250 0.07717765 Thanks not sure exactly what you're looking for but... dat-data.frame(width=1:6*100,**rel1=runif(6), rel2=runif(6)) dat.melt-melt(dat,id.var='**width') ggplot(dat.melt,aes(x=factor(**width),y=value,fill=variable))** +geom_bar(stat='identity',**position='dodge') -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add color to Boxplot by value
how bout: dat-data.frame(val=rnorm(100,12,10),x=letters[1:4]) col.val-ddply(dat,.(x),summarise,mean(val)) col.val$breaks-cut(col.val$..1,c(0,9,15,Inf)) dat.merge-merge(dat,col.val) ggplot(dat.merge,aes(x=x,y=val,colour=breaks))+geom_boxplot()+scale_color_manual(values=c('green','yellow','red')) On Thu, Jan 12, 2012 at 7:45 AM, KWyshak kwys...@illumina.com wrote: I have a boxplot of Production run rates per 10 minute intervals and I would like to color code them by the average (i.e. 15ppm = green, 9ppm = red, everything else yellow). Is there a way to do this? http://r.789695.n4.nabble.com/file/n4289381/RunRateBoxWhisker.png -- View this message in context: http://r.789695.n4.nabble.com/Add-color-to-Boxplot-by-value-tp4289381p4289381.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] colored outliers
# find top 4 points circ - TOC_NI[order(TOC_NI$NI,decreasing=T),][1:4,]TOC_NI[order(TOC_NI$NI,decreasing=T),][1:4,] # add them to your plot! plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450)) abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3) points(NI~TOC,data=TOC_NI,col='red',pch=1,size=3) Justin On Tue, Jan 10, 2012 at 7:11 AM, Geophagus falk.hilli...@twain-systems.comwrote: Hi @ all, I have question how to mark significant outliers in R. This is my very simple script to plot a regression: TOC_NI-read.csv2(C:/Users/XYZ/Desktop/Master/Daten/Statistik/TOC-NI.csv, sep=;, dec=,, encoding=UTF-8) plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450)) abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3) summary(lm(NI~TOC,data=TOC_NI)) The result is the following pic: http://r.789695.n4.nabble.com/file/n4282207/nickel_TOC_5f.png nickel_TOC_5f.png Now I want to make small red circles around the four highest values of Ni. Does anyone has an idea how to do that? Thanks a lot! Best Regards Geophagus -- View this message in context: http://r.789695.n4.nabble.com/colored-outliers-tp4282207p4282207.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] colored outliers
woops! see inline. Hope that helps, and enjoy R. Justin On Tue, Jan 10, 2012 at 8:40 AM, Geophagus falk.hilli...@twain-systems.comwrote: Hi Justin, thanks a lot for your quick answer. If I use your code, all points become red. How do you include the sorted and separated four values into the points argument? The variable in your script is called circ but this is not fronted up anymore. Here the script again: TOC_NI-read.csv2(C:/Users/hilliges/Desktop/Master/Daten/Statistik/TOC-NI.csv, sep=;, dec=,, encoding=UTF-8) this line just needs trimming. not sure how i missed that on my copy... anyway, order puts the data.frame in order of the given vector, default behavior sorts in ascending order unless you specify decreasing=TRUE. circ-TOC_NI[order(TOC_NI$NI,decreasing=T),][1:4,] and it should work plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450)) abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3) points(NI~TOC,data=TOC_NI,col='red',pch=1,size=3) Thanks a lot for your help! GeO -- View this message in context: http://r.789695.n4.nabble.com/colored-outliers-tp4282207p4282481.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] match matrices of different lengths
see ?merge merge(xx,aa,by.x='x',by.y='a') x y b 1 2.00112e+11 1.0 1.2 2 2.00112e+11 1.1 1.9 making the two matricies time series does not mean that R knows that the first column is a datetime. and depending on your desired result, that may not be important. hope that helps, Justin On Thu, Jan 5, 2012 at 5:51 AM, Thijs vanden Bergh bergh.thijsvan...@gmail.com wrote: was trying to match different matrices of different lengths with in the first collumn date and time info (yearmonthdayhourminute). the routine needs to return NA´s where data of either of the matrices is non existent. have been trying the following: x - c(200112030003, 200112030004, 200112030005, 200112030006) y - c(0.1, 1, 1.1, 1.5) a - c(200112030004, 200112030005, 200112030007, 200112030008, 200112030009) b - c(1.2, 1.9, 2.0, 2.5, 2.1) xx - cbind(x, y) aa - cbind(a, b) xxnew - ts(xx) aanew - ts(aa) cc - ts.union(xxnew, aanew) cc this does however not give the wished for result as it simply cbinds the two matrices and filles up empty spots that are created due to the one matrix being shorter then the other at the bottom end of the shortest matrix. i realy want the routine to match matrix xx and aa to time in the first collumn of both matrices. any help towards this end would be much appreciated, th. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 - tricky problem
how bout: dat-data.frame(id=1:4,city=c('berlin','munich'),likeability=c(5,4,6,5),uniqueness=c(3,4,4,4)) ggplot(ddply(melt(dat, id.vars=c('id','city')), .(variable,city), summarise, value=mean(value)), aes(x=factor(city),y=value)) + geom_point() + facet_wrap(~variable) the line drawing is a bit more tricky... Since the x values are factors rather than continuous, fitting a line to them is kind of nonsense. It matters which order they are in for example. If instead you want to plot something like: ggplot(dat,aes(x=likeability,y=uniqueness,colour=city))+geom_point()+geom_smooth(aes(group=city),method='lm') You could draw fit lines that make a bit more sense. Forgive me if I'm over simplifying your problem! Justin On Thu, Jan 5, 2012 at 7:46 AM, Mario Giesel rr.gie...@yahoo.de wrote: Hello, R friends, I've been struggling quite a bit with ggplot2. Having worked through Hadleys book twice I still wonder how to solve this task. 1. Short example Dataframe: idcityLikeabilityUniqueness 1Berlin53 2Munich44 3Berlin64 4Munich54 2. Task: a) Facetting plots for each attitude (1 plot for likeability and uniqueness each, horizontally on one page) b) Showing Berlin and Munich together on x axis c) Showing the means of Berlin and Munich on y axis (means of cities in likeability on first plot, means of cities in uniqueness on second plot) d) Drawing a line through mean points on each plot Hope I could explain it understandably. Any help is appreciated! Thanks a lot, Mario [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [newbie] stack operations, or functions with side effects (or both)
do s[1] and s[-1] do what you're looking for? those are just to display... if you want to change s, you need to reassign it or fiddle with namespacing. however, I'd say it is better to write R code as though data structures are immutable until you explicitly re-assign them rather than trying to deal with side effects and state... pop - function(vec){ + print(vec[1]) + print(vec[-1]) + return(vec[-1]) +} s - 1:5 s - pop(s) [1] 1 [1] 2 3 4 5 s [1] 2 3 4 5 On Wed, Jan 4, 2012 at 1:22 PM, Tom Roche tom_ro...@pobox.com wrote: summary: Specifically, how does one do stack/FIFO operations in R? Generally, how does one code functions with side effects in R? details: I have been a coder for years, mostly using C-like semantics (e.g., Java). I am now trying to become a scientist, and to use R, but I don't yet have the sense of good R and R idiom (i.e., expressions that are to R what (e.g.) the Schwartzian transform is to Perl). I have a data-assimilation problem for which I see a solution that wants a stack--or, really, just a pop(...) such that * s - c(1:5) * print(s) [1] 1 2 3 4 5 * pop(s) [1] 1 * print(s) [1] 2 3 4 5 but in fact I get pop(s) Error: could not find function pop and Rseek'ing finds me nothing. When I try to write pop(...) I get pop1 - function(vector_arg) { + length(vector_arg) - lv + vector_arg[1] - ret + vector_arg - vector_arg[2:lv] + ret + } pop1(s) [1] 1 print(s) [1] 1 2 3 4 5 i.e., no side effect on the argument pop2 - function(vector_arg) { + length(vector_arg) - lv + vector_arg[1] - ret + assign(vector_arg, vector_arg[2:lv]) + return(ret) + } pop2(s) [1] 1 print(s) [1] 1 2 3 4 5 ditto :-( What am I missing? * Is there already a stack API for R (which I would expect)? If so, where? * How to cause the desired side effect to the argument in the code above? TIA, Tom Roche tom_ro...@pobox.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combining characters
apply(expand.grid(x, y, z, stringsAsFactors=F), 1, paste, collapse=' ') On Wed, Jan 4, 2012 at 8:32 AM, jeremy jeremynamer...@gmail.com wrote: Hi all, I'm trying to combine exhaustively several character arrays in R like: x=c(one,two,three) y=c(yellow,blue,green) z=c(apple,cheese) in order to get concatenation of x[1] y[1] z[1] (one yellow apple) x[1] y[1] z[2] (one yellow cheese) x[1] y[2] z[1](one blue apple) ... x[length(x)] y[length(y)] z[length(z)] (three green cheese) Anyone has a solution ? Thank in advance -- View this message in context: http://r.789695.n4.nabble.com/Combining-characters-tp4261888p4261888.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a quick question about rbinom
homework or not, ?rbinom should be plenty. On Wed, Jan 4, 2012 at 1:38 PM, lynn.tsai vernal@gmail.com wrote: Hello, I have the following code using rbinom, but I don't understand what *+1* means in the code. Could someone help? Thanks so much, X1-c(A,B)[rbinom(n,1,0.6)+1] X2-c(C,D)[rbinom(n,1,0.1)+1] -- View this message in context: http://r.789695.n4.nabble.com/a-quick-question-about-rbinom-tp4262977p4262977.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Applyiing mode() or class() to each column of a data.frame XXXX
there is also colwise in the plyr package. library(plyr) colwise(class)(data6) v13 v14 v15 f4 v16 1 integer numeric character factor logical Justin On Thu, Dec 29, 2011 at 4:47 PM, Jean V Adams jvad...@usgs.gov wrote: Dan Abner wrote on 12/29/2011 06:13:11 PM: Hi everyone, I am attempting to use the apply() function to obtain the mode and class of each column in a data frame, however, I am encountering unexpected results. I have the following example data: v13-1:6 v14-c(1,2,3,3,NA,1) v15-c(Good,Bad,NA,Good,Bad,Bad) f4-factor(rep(c(Blue,Red,Green),2)) v16-c(F,T,F,F,T,F) data6-data.frame(v13,v14,v15,f4,v16) data6 Here is my function definition: contents-function(x){ output-data.frame(Varnum=1:ncol(x), Name=names(x), Mode=apply(x,2,mode), Class=apply(x,2,class)) print(output) } Use sapply() instead of apply(). In the help file for apply() it says: If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array. This coercion to a matrix might be causing the unexpected result. sapply() and lapply() are designed specifically for lists (which a data frame is). I also simplified the function a bit ... contents-function(x){ data.frame(Varnum=1:ncol(x), Name=names(x), Mode=sapply(x,mode), Class=sapply(x,class)) } Jean When I call the function, I obtain the following: contents(data6) Varnum Name Mode Class v13 1 v13 character character v14 2 v14 character character v15 3 v15 character character f4 4 f4 character character v16 5 v16 character character = Any help is appreciated. Thank you, Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with code
the short answer... which is a guess cause you didn't provide a reproducible example... is: your column (i think its called t1d_ptype[1:25]) is a factor and using factors is dangerous at best. you can check with ?str. see ?factor for how to convert back to strings and see if your code works. to answer your second question, yes I'm sure there is a better simple way to do this, but i can't follow what you're doing... for example, I don't know what c1 is... but, the place I would look is at the plyr package. its excellent at splitting and reordering data. and one final note, you should avoid naming things with pre-existing R functions (e.g. data). Justin On Tue, Dec 20, 2011 at 11:14 AM, 1Rnwb sbpuro...@gmail.com wrote: hello gurus, i have a data frame like this HTN HTN_FDR Dyslipidemia CAD t1d_ptype[1:25] 1Y YY T1D 2 T1D 3 Ctrl_FDR 4 T1D 5Y Ctrl 6 Ctrl 7 Ctrl_FDR 8 T1D 9YY T1D 10 T1D 11 Ctrl_FDR 12 YY T1D 13 Y YY T1D 14 T1D 15 Ctrl 16 Ctrl 17 Ctrl_FDR 18 T1D 19 T1D 20 Y T1D 21 Ctrl_FDR 22 Ctrl_FDR 23 Ctrl 24 Ctrl 25 T1D i am converting it to define the groups more uniformly using this code: for( i in 1:dim(c1)[1]) { num_comp-0 for (j in 1:dim(c1)[2]) if (c1[i,j]==2) num_comp=num_comp+1 #Y=2 for (j in 1:dim(c1)[2]) if(num_comp0) { if (data$t1d_ptype[i] == T1D c1[i ,j] == 2) c2[i,j]-T1D_w if (data$t1d_ptype[i] == T1D c1[i, j] == 1) c2[i,j]-T1D_oc if(substr(data$t1d_ptype[i],1,4) == Ctrl c1[i,j] == 2) c2[i,j]-Ctrl_w if (substr(data$t1d_ptype[i],1,4) == Ctrl c1[i,j] == 1) c2[i,j]-Ctrl_oc } else { if(data$t1d_ptype[i] == T1D) c2[i,j]-T1D_noc if(substr(data$t1d_ptype[i],1,4) == Ctrl) c2[i,j]-Ctrl_noc } } it is giving me error In `[-.factor`(`*tmp*`, iseq, value = structure(c(NA, ... : invalid factor level, NAs generated Also it there a simple way to do this. Thanks Sharad -- View this message in context: http://r.789695.n4.nabble.com/Help-with-code-tp4218989p4218989.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with code
Fair enough and good point. How about, dangerous when used unknowingly! On Tue, Dec 20, 2011 at 1:01 PM, William Dunlap wdun...@tibco.com wrote: Re your column (i think its called t1d_ptype[1:25]) is a factor and using factors is dangerous at best. This depends on how you want to define dangerous. If t1d_ptype ought take values from a certain set of strings then making it a factor gives you some safety, since it warns you when you go outside of that set and try to give it an illegal value. E.g., sex - factor(c(M,F,F), levels=c(F, M)) sex[2] - no Warning message: In `[-.factor`(`*tmp*`, 2, value = no) : invalid factor level, NAs generated It does take more work to set up, since you need to enumerate the set of good strings. That is tedium, not danger. If t1d_ptype might take any value, then make it a character vector. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Justin Haynes Sent: Tuesday, December 20, 2011 11:54 AM To: 1Rnwb Cc: r-help@r-project.org Subject: Re: [R] Help with code the short answer... which is a guess cause you didn't provide a reproducible example... is: your column (i think its called t1d_ptype[1:25]) is a factor and using factors is dangerous at best. you can check with ?str. see ?factor for how to convert back to strings and see if your code works. to answer your second question, yes I'm sure there is a better simple way to do this, but i can't follow what you're doing... for example, I don't know what c1 is... but, the place I would look is at the plyr package. its excellent at splitting and reordering data. and one final note, you should avoid naming things with pre-existing R functions (e.g. data). Justin On Tue, Dec 20, 2011 at 11:14 AM, 1Rnwb sbpuro...@gmail.com wrote: hello gurus, i have a data frame like this HTN HTN_FDR Dyslipidemia CAD t1d_ptype[1:25] 1Y YY T1D 2 T1D 3 Ctrl_FDR 4 T1D 5Y Ctrl 6 Ctrl 7 Ctrl_FDR 8 T1D 9YY T1D 10 T1D 11 Ctrl_FDR 12 YY T1D 13 Y YY T1D 14 T1D 15 Ctrl 16 Ctrl 17 Ctrl_FDR 18 T1D 19 T1D 20 Y T1D 21 Ctrl_FDR 22 Ctrl_FDR 23 Ctrl 24 Ctrl 25 T1D i am converting it to define the groups more uniformly using this code: for( i in 1:dim(c1)[1]) { num_comp-0 for (j in 1:dim(c1)[2]) if (c1[i,j]==2) num_comp=num_comp+1 #Y=2 for (j in 1:dim(c1)[2]) if(num_comp0) { if (data$t1d_ptype[i] == T1D c1[i ,j] == 2) c2[i,j]-T1D_w if (data$t1d_ptype[i] == T1D c1[i, j] == 1) c2[i,j]-T1D_oc if(substr(data$t1d_ptype[i],1,4) == Ctrl c1[i,j] == 2) c2[i,j]-Ctrl_w if (substr(data$t1d_ptype[i],1,4) == Ctrl c1[i,j] == 1) c2[i,j]-Ctrl_oc } else { if(data$t1d_ptype[i] == T1D) c2[i,j]-T1D_noc if(substr(data$t1d_ptype[i],1,4) == Ctrl) c2[i,j]-Ctrl_noc } } it is giving me error In `[-.factor`(`*tmp*`, iseq, value = structure(c(NA, ... : invalid factor level, NAs generated Also it there a simple way to do this. Thanks Sharad -- View this message in context: http://r.789695.n4.nabble.com/Help-with-code-tp4218989p4218989.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http
Re: [R] how to manually enter an double quote as data feed?
\ is how its displayed on the screen. however, if you write your object to a csv it will be correct. r cant display as it is so it is escaping the second double quote for you however, ' (double quote single quote double quote) does display correctly as well as save correctly. If that doesn't answer your question, some more back story on what you're trying to do would help. Justin On Tue, Dec 13, 2011 at 2:03 PM, bonnieyuan bby2...@columbia.edu wrote: I'm doing a text mining project where I have to manually enter a double quote as an element inside a vector. I tried char[10]=''#where i enclosed the double quote in a pair of single quotes. But the result is [1] \. Somehow a back slash is added automatically. I also tried to enclose the double quote in a pair of double quotes. That didn't work either. I'm using Mac and latest release of R. Thank you! Bonnie Yuan -- View this message in context: http://r.789695.n4.nabble.com/how-to-manually-enter-an-double-quote-as-data-feed-tp4192283p4192283.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using sample
Emma, If you haven't spent much time on the r-help forums, please do read the posting guide. You need to provide reproducible examples for us to help you. We don't know anything about your data... what is event.details, (if you can't provide the data often ?str will do) since I don't know what event.details is, I can't figure out waht the line: obs = (1:133429)[event.details[,2] == i] is supposed to do. But if I had to guess... ?sample says it expects the first argument as a vector. I assume obs is not a vector but a larger structure? Feel free to post more info about your data (see ?str and ?dput) or if you can generate made up data that replicates your problem that works too. Justin On Wed, Dec 7, 2011 at 9:16 AM, bevare emma.ra...@jbaconsulting.co.ukwrote: Hi, Can anyone help sort out the problem with the following script - I am a R newbie and I am self taught. obs.all = c() for(i in 1:386){ if (n.sim[i]0){ obs = (1:133429)[event.details[,2] == i] obs.all = c(obs.all, sample(obs[obs n.sim[i]], size = n.sim[i], replace=T)) } Basically, in the sample bit, I only want to get obs.all if the value of obs is less than the value of n.sim[i]. I get the error message Error in sample(obs[obs n.sim[i]], size = n.sim[i], replace = T) : invalid first argument length(n.sim) is 386 Thanks in advance for your suggestions Emma -- View this message in context: http://r.789695.n4.nabble.com/using-sample-tp4169747p4169747.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hour in x-axis
without knowing much about your data or the base plotting... I'd use the library ggplot2. First, you'll need to format your dates to POSIXct AggData$time - as.POSIXct(AggData$time,format='%H:%M') Then plotting is trivial. ggplot(AggData,aes(x=time,y=value))+geom_points() or +geom_line() if you'd rather. Hope that helps, Justin On Tue, Nov 29, 2011 at 10:07 AM, threshold r.kozar...@gmail.com wrote: Dear R useres, got the following problem. Given the AggData (listed below) I need to plot AggData[,2] vs time (AggData[,1]) for chosen 'rows'. Ive done already: plot(AggData[rows,2], xaxt='n') axis(1,at=seq(1,length(rows),1),sub(,, AggData[rows,1])) which works, but I need to list only chosen data points, say full hours or every 60th point, something like: axis(1,at=seq(1,seq(1,length(rows),60)),sub(, , AggData[day.rows[seq(1,length(rows),60)],2])) but does not work. Could be nice if time on the x-axis is in H:m format (no seconds). In the original data time bout is 1 minute, e.g. 17:19:35, 17:20:35, 17:21:35 . Taken every 100th for brevity yields (AggData[seq(1,length(rows),100),c(2,7)]) time value 117:19:3580.68327 101 18:59:3580.97230 201 20:39:3578.30810 301 22:19:3580.41558 401 23:59:3577.01051 501 01:39:3577.19687 601 03:19:3578.20762 701 04:59:3577.13315 801 06:39:3576.29110 901 08:19:3575.32090 1001 09:59:3585.32890 1101 11:39:3579.86978 1201 13:19:3583.32418 1301 14:59:3578.26018 1401 16:39:3579.06434 Thanks in advance. Best, robert -- View this message in context: http://r.789695.n4.nabble.com/hour-in-x-axis-tp4120142p4120142.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate syntax for grouped column means
look at just your data that is in that first id category and I bet you can figure it out! myData[myData$id=='0m11',] var1 var2 id 10 30.79 32.15 0m11 11 30.79 32.39 0m11 12 30.94NA 0m11 aggregate performs the na.rm step on the entire row thus, a mean of 30.79. data.table and plyr perform the na.rm on each column. Justin On Tue, Nov 29, 2011 at 12:21 PM, Juliet Hannah juliet.han...@gmail.comwrote: I am calculating the mean of each column grouped by the variable 'id'. I do this using aggregate, data.table, and plyr. My aggregate results do not match the other two, and I am trying to figure out what is incorrect with my syntax. Any suggestions? Thanks. Here is the data. myData - structure(list(var1 = c(31.59, 32.21, 31.78, 31.34, 31.61, 31.61, 30.59, 30.84, 30.98, 30.79, 30.79, 30.94, 31.08, 31.27, 31.11, 30.42, 30.37, 30.29, 30.06, 30.3, 30.43, 30.61, 30.64, 30.75, 30.39, 30.1, 30.25, 31.55, 31.96, 31.87, 30.29, 30.15, 30.37, 29.59, 29.52, 28.96, 29.69, 29.58, 29.52, 30.21, 30.3, 30.25, 30.23, 30.29, 30.39), var2 = c(33.78, 33.25, NA, 32.05, 32.59, NA, 32.24, NA, NA, 32.15, 32.39, NA, 32.4, 31.6, NA, 30.5, 30.66, NA, 30.6, 29.95, NA, 31.24, 30.73, NA, 30.51, 30.43, 31.17, 31.44, 31.17, 31.18, 31.01, 30.98, 31.25, 30.44, 30.47, NA, 30.47, 30.56, NA, 30.6, 30.57, NA, 31, 30.8, NA), id = c(0m4, 0m4, 0m4, 0m5, 0m5, 0m5, 0m6, 0m6, 0m6, 0m11, 0m11, 0m11, 0m12, 0m12, 0m12, 205m1, 205m1, 205m1, 205m4, 205m4, 205m4, 205m5, 205m5, 205m5, 205m6, 205m6, 205m6, 205m7, 205m7, 205m7, 600m1, 600m1, 600m1, 600m3, 600m3, 600m3, 600m4, 600m4, 600m4, 600m5, 600m5, 600m5, 600m7, 600m7, 600m7)), .Names = c(var1, var2, id), row.names = c(NA, -45L), class = data.frame) head(myData) var1 var2 id 1 31.59 33.78 0m4 2 32.21 33.25 0m4 3 31.78NA 0m4 4 31.34 32.05 0m5 5 31.61 32.59 0m5 6 31.61NA 0m5 results1 - aggregate(. ~ id ,data=myData,FUN=mean,na.rm=T) head(results1,1) #id var1 var2 # 1 0m11 30.79 32.27 library(data.table) mydt - data.table(myData) setkey(mydt,id) results2 - mydt[,lapply(.SD,mean,na.rm=TRUE),by=id] head(results2,1) # id var1 var2 # [1,] 0m11 30.84 32.27 library(plyr) results3 - ddply(myData,.(id),colwise(mean),na.rm=TRUE) head(results3,1) #id var1 var2 # 1 0m11 30.84 32.27 sessionInfo() R version 2.14.0 (2011-10-31) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] plyr_1.6 data.table_1.7.3 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tip: large plots
Very cool. Sadly, as far as I can tell, it doesn't work with ggplot though :( x-runif(1e6) y-runif(1e6) system.time(plot(x,y,pch='.')) user system elapsed 0.824 0.012 0.845 system.time(plot(x,y)) user system elapsed 33.422 0.016 33.545 system.time(print(qplot(x,y))) user system elapsed 45.142 0.228 45.687 system.time(print(qplot(x,y,pch='.'))) user system elapsed 47.483 1.060 49.040 system.time(print(qplot(x,y,shape='.'))) user system elapsed 44.807 0.689 45.710 On Fri, Nov 18, 2011 at 11:03 AM, Sarah Goslee sarah.gos...@gmail.comwrote: Hi all, I'm working with a bunch of large graphs, and stumbled across something useful. Probably many of you know this, but I didn't and so others might benefit. Using pch=. speeds up plotting considerably over using symbols. x - runif(100) y - runif(100) system.time(plot(x, y, pch=.)) user system elapsed 1.042 0.030 1.077 system.time(plot(x, y)) user system elapsed 37.865 0.033 38.122 If you have enough points, the result is also more legible. Choice of which pch symbol makes a difference too, the default pch=1 being the slowest of what I tried, but . is by far the speediest. system.time(plot(x, y, pch=0)) user system elapsed 11.191 0.011 11.270 system.time(plot(x, y, pch=1)) user system elapsed 38.024 0.008 38.245 system.time(plot(x, y, pch=2)) user system elapsed 14.140 0.027 14.270 system.time(plot(x, y, pch=3)) user system elapsed 15.696 0.011 15.799 system.time(plot(x, y, pch=4)) user system elapsed 18.770 0.007 18.888 This is a vanilla R session, 2.13.1 for x86_64-redhat-linux-gnu. I haven't tried it on any other OS, but it's making my life a lot smoother right now. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tip: large plots
That is a function I did not know about, thanks Hadley! I still don't see the speed increase that you do with the base plot package, but I'm sticking with ggplot anyway! x-runif(1e6) y-runif(1e6) system.time(print(qplot(x,y))) user system elapsed 42.234 0.520 43.061 system.time(print(qplot(x,y,pch=I('.' user system elapsed 32.370 0.204 33.868 On Fri, Nov 18, 2011 at 12:39 PM, Hadley Wickham had...@rice.edu wrote: You need: system.time(print(qplot(x,y,pch=I('.' Hadley On Fri, Nov 18, 2011 at 1:30 PM, Justin Haynes jto...@gmail.com wrote: Very cool. Sadly, as far as I can tell, it doesn't work with ggplot though :( x-runif(1e6) y-runif(1e6) system.time(plot(x,y,pch='.')) user system elapsed 0.824 0.012 0.845 system.time(plot(x,y)) user system elapsed 33.422 0.016 33.545 system.time(print(qplot(x,y))) user system elapsed 45.142 0.228 45.687 system.time(print(qplot(x,y,pch='.'))) user system elapsed 47.483 1.060 49.040 system.time(print(qplot(x,y,shape='.'))) user system elapsed 44.807 0.689 45.710 On Fri, Nov 18, 2011 at 11:03 AM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi all, I'm working with a bunch of large graphs, and stumbled across something useful. Probably many of you know this, but I didn't and so others might benefit. Using pch=. speeds up plotting considerably over using symbols. x - runif(100) y - runif(100) system.time(plot(x, y, pch=.)) user system elapsed 1.042 0.030 1.077 system.time(plot(x, y)) user system elapsed 37.865 0.033 38.122 If you have enough points, the result is also more legible. Choice of which pch symbol makes a difference too, the default pch=1 being the slowest of what I tried, but . is by far the speediest. system.time(plot(x, y, pch=0)) user system elapsed 11.191 0.011 11.270 system.time(plot(x, y, pch=1)) user system elapsed 38.024 0.008 38.245 system.time(plot(x, y, pch=2)) user system elapsed 14.140 0.027 14.270 system.time(plot(x, y, pch=3)) user system elapsed 15.696 0.011 15.799 system.time(plot(x, y, pch=4)) user system elapsed 18.770 0.007 18.888 This is a vanilla R session, 2.13.1 for x86_64-redhat-linux-gnu. I haven't tried it on any other OS, but it's making my life a lot smoother right now. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply on rows and columns?
To expand on what Sarah and Michael said: if you have a 3d array: x-array(1:4,c(2,2,4)) x , , 1 [,1] [,2] [1,]13 [2,]24 , , 2 [,1] [,2] [1,]13 [2,]24 , , 3 [,1] [,2] [1,]13 [2,]24 , , 4 [,1] [,2] [1,]13 [2,]24 apply(x,c(1,2),sum) [,1] [,2] [1,]4 12 [2,]8 16 a margin of c(1,2) makes more sense. Hope that clarifies things. Justin On Wed, Nov 16, 2011 at 12:18 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, On Wed, Nov 16, 2011 at 3:13 PM, rkevinbur...@charter.net wrote: I have the following scenario: m - matrix(1:4, ncol=2) m [,1] [,2] [1,] 1 3 [2,] 2 4 apply(m, 2, sum) [1] 3 7 apply(m, 1, sum) [1] 4 6 So I can apply to rows *or* columns. According to the documentation (?apply) MARGIN a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2) indicates rows and columns. Where X has named dimnames, it can be a character vector selecting dimension names. But I get the following results: apply(m, c(1,2), sum) [,1] [,2] [1,] 1 3 [2,] 2 4 How am I to interpret this result? I'm pretty sure R is taking the sum of m[1,1] and putting it [1,1], and the sum of m[1,2] and putting it in [1,2] and so on. You instructed apply() to work on rows and columns *simultaneously*, rather than sequentially. apply() on c(1,2) is useful if you have a matrix that's three-dimensional, but not so much if it's two dimensional. What are you trying to accomplish? Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract pattern from string
take a look at the structure of what Sys.time returns. str(Sys.time) and now at ?strptime! format(Sys.time(),format='%d-%H-%M-%S') [1] 15-09-55-55 format(Sys.time(),format='%Y') [1] 2011 format(Sys.time(),format='%m') [1] 11 Hope that helps, Justin On Tue, Nov 15, 2011 at 9:48 AM, syrvn ment...@gmx.net wrote: Hello, with Sys.time() you get the following string: 2011-11-15 16:25:55 GMT How can I extract the following substrings: year - 2011 month - 11 day_time - 15_16_25_55 Cheers, Syrvn -- View this message in context: http://r.789695.n4.nabble.com/Extract-pattern-from-string-tp4073432p4073432.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create design matrix
?expand.grid expand.grid(c(M,F),c(Y,O)) Var1 Var2 1MY 2FY 3MO 4FO Justin On Thu, Nov 3, 2011 at 10:56 AM, Bond, Stephen stephen.b...@cibc.com wrote: Greetings useRs, What is the easiest way to create a design matrix of several factor variables? Function gendata in Design seems to do that for a fitted model, but how to do that only on several factor vectors?? The result should be a df with one row for each distinct combination of levels of factors eg for (M,F) (Y,O) We get M Y M O F Y F O In reality I will have more than 1000 rows so doing by hand not good. Maybe there is a way with outer, but I couldn't see it. All the best to everybody. Stephen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mysterious warning message regarding bytecode...
While running a long script which source()s other scripts I get the following warning: Warning message: In t(object$S[[1]]) : bytecode version mismatch; using eval I cannot replicate it if I run the sourced files line by line though... What is that error? And do I care about it? It doesn't seem to affect my output as far as I can tell. Thanks! Justin sessionInfo() R version 2.13.2 (2011-09-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] mgcv_1.7-9stringr_0.5 RPostgreSQL_0.2-0 biglm_0.8 DBI_0.2-5 doMC_1.2.3multicore_0.1-7 [8] foreach_1.3.2 codetools_0.2-8 iterators_1.0.5 cairoDevice_2.19 pixmap_0.4-11 gridExtra_0.8.5 splancs_2.01-29 [15] sp_0.9-91 ellipse_0.3-5 ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.6 MASS_7.3-14 loaded via a namespace (and not attached): [1] compiler_2.13.2 digest_0.5.1lattice_0.19-33 Matrix_1.0-1nlme_3.1-102 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor level issue after subsetting
first of all, the subsetting line is overly complicated. dat.sub-dat[dat$treat!='cont',] will work just fine. R does exactly what you're describing. It knows the levels of the factor. Once you remove 'cont' from the data, that doesn't mean that the level is removed from the factor: df-data.frame(let=factor(sample(letters[1:5],100,replace=T)),num=rnorm(100)) str(df) 'data.frame': 100 obs. of 2 variables: $ let: Factor w/ 5 levels a,b,c,d,..: 1 5 1 4 3 5 2 2 1 3 ... $ num: num 0.224 -0.523 0.974 -0.268 -0.61 ... df.sub-df[df$let!='a',] str(df.sub) 'data.frame': 82 obs. of 2 variables: $ let: Factor w/ 5 levels a,b,c,d,..: 5 4 3 5 2 2 3 3 5 3 ... $ num: num -0.523 -0.268 -0.61 -1.383 -0.193 ... unique(df.sub$let) [1] e d c b Levels: a b c d e df.sub$let-factor(df.sub$let) unique(df.sub$let) [1] e d c b Levels: e d c b str(df.sub$let) Factor w/ 4 levels e,d,c,b: 1 2 3 1 4 4 3 3 1 3 ... by redefining your factor you can eliminate the problem. the other option, if you don't want factors to begin with is: options(stringsAsFactors=FALSE) # to set the global option or dat-read.csv(~/MyFiles/data.csv,stringsAsFactors=FALSE) # to set the option locally for this single read.csv call. On Tue, Nov 1, 2011 at 2:28 PM, Schreiber, Stefan stefan.schrei...@ales.ualberta.ca wrote: Dear list, I cannot figure out why, after sub-setting my data, that particular item which I don't want to plot is still in the newly created subset (please see example below). R somehow remembers what was in the original data set. A work around is exporting and importing the new subset. Then it's all fine; but I don't like this idea and was wondering what am I missing here? Thanks! Stefan P.S. I am using R 2.13.2 for Mac. dat-read.csv(~/MyFiles/data.csv) class(dat$treat) [1] factor dat treat yield 1 cont 98.7 2 cont 97.2 3 cont 96.1 4 cont 98.1 5 10 103.0 6 10 101.3 7 10 102.1 8 10 101.9 9 30 121.1 10 30 123.1 11 30 119.7 12 30 118.9 13 60 109.9 14 60 110.1 15 60 113.1 16 60 112.3 plot(dat$treat,dat$yield) dat.sub-dat[which(dat$treat!='cont')] class(dat.sub$treat) [1] factor dat.sub treat yield 5 10 103.0 6 10 101.3 7 10 102.1 8 10 101.9 9 30 121.1 10 30 123.1 11 30 119.7 12 30 118.9 13 60 109.9 14 60 110.1 15 60 113.1 16 60 112.3 plot(dat.sub$treat,dat.sub$yield) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape2: Lost Values Between melt() and dcast()
The reason dcast would give that warning (not a failure) is if the formula you gave did not specify unique values. Thus, dcast needs an aggregating function, which defaults to length. However, the dcast calls that failed can be helpful for determining the source of your error. I'd look at the outputs of those two dcast calls and find cells where the length is 1. Those are duplicated entries in your initial data.frames (when I've run into this is was usually due to NA values somewhere unexpected). Hope that clarifies things. Justin On Mon, Oct 31, 2011 at 9:32 AM, Rich Shepard rshep...@appl-ecosys.com wrote: Working with 5 subset streams from my source data frame, three of them successfully call dcast(), but two fail: jerritt.cast - dcast(jerritt.melt, site + sampdate ~ param) Aggregation function missing: defaulting to length and winters.cast - dcast(winters.melt, site + sampdate ~ param) Aggregation function missing: defaulting to length Yet both data frames have the values in their .melt data frames: summary(jerritt.melt) site sampdate param variable JCM-1 :2178 Min. :1978-03-28 pH : 292 quant:7519 JCM-20A:2149 1st Qu.:1996-05-24 As : 286 JC-E : 476 Median :2000-05-31 SO4 : 271 JC : 400 Mean :2001-02-04 TDS : 271 GD-1 : 395 3rd Qu.:2006-05-31 Cl : 253 JC-2 : 349 Max. :2009-12-30 Zn : 250 (Other):1572 (Other):5896 value Min. : 0.000 1st Qu.: 0.005 Median : 0.650 Mean : 317.588 3rd Qu.: 27.000 Max. :20450.000 NA's : 2134.000 and summary(winters.melt) site sampdate param variable WC :601 Min. :1987-07-23 As : 96 quant:1189 WC-2 :327 1st Qu.:1994-06-15 TDS : 79 WC-1 :261 Median :1995-07-27 NO3-N : 74 BC-0.5 : 0 Mean :1997-05-15 pH : 72 BC-1 : 0 3rd Qu.:1996-07-29 SO4 : 69 BC-1.5 : 0 Max. :2011-06-06 Cl : 64 (Other): 0 (Other):735 value Min. : 0.00 1st Qu.: 0.05 Median : 7.59 Mean : 79.20 3rd Qu.: 75.00 Max. :2587.00 NA's : 252.00 What might be causing dcast() to fail with these two data frames while it succeeds with three others processed using the same syntax? If additional information would help, let me know and I'll provide it. Puzzled, Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replacing matching values by related values
in your assignment for t3 you use nt which is undefined. thus t.n$treatment is NAs but: df-data.frame(num=1:10,let=letters[1:10]) dat-data.frame(let=sample(letters[1:10],20,replace=T)) dat$matched-df$num[match(dat$let,df$let)] should get you started On Sun, Sep 18, 2011 at 7:56 AM, Janssen, K.J.M. k.j.m.jans...@umcutrecht.nl wrote: Apologies, I wanted to make life easier by shortly describing my problem. Indeed, it is better to post the full code. I am not familiar with the dput, but I have pasted the code that I have used below. d - matrix(NA,15,5) d - as.data.frame(d) colnames(d) - c(studynumber,t1,t2,t[,1],t[,2]) d$studynumber - c(1:15)# add study numbers to select studies in scenarios d$t1 -c(car_pac,car_pac,cis_vin,car_pac,cis_doc,cis_gem,cis_gem,cis_vin,car_pac,car_doc,car_pac,car_pac,car_doc.pac,cis_vin,cis_iri) d$t2 -c(gef,bev_car_pac,cet_cis_vin,gef,gef,bev_cis_gem,cis_pem,cet_cis_vin,car_gem_pac,car_pem,erl,cis_pac,cet_car_doc.pac,cis_doc,car_pac) # Link treatment to relating treatment number: make vector of all unique treatment options t1 - duplicated(c(d$t1,d$t2)) # returns TRUE and False, implying that we can need it so select t2 - c(d$t1,d$t2) # combine both vectors, as treatments can be both reference as index treatment t3 - na.omit(ifelse(t1==FALSE,c(d$t1,d$t2),NA))[1:nt] # omit double treatment #make dataset with first colomn all possible treatments, and second colomn their respective numbers t.n - matrix(NA,17,2) # list possible treatments (here 17), and link them to numbers t.n - as.data.frame(t.n) colnames(t.n) - c(treatment,numbers) t.n$treatment - t3 t.n$numbers - 1:17 # link treatments in d with treatment numbers in dataset t.n Here is where I aim to fill d$t[,1] and d$t[,2] with the corresrponding numbers from t.n Thanks. Kristel -Oorspronkelijk bericht- Van: David Winsemius [mailto:dwinsem...@comcast.net] Verzonden: zo 18-9-2011 15:20 Aan: Janssen, K.J.M. CC: michael.weyla...@gmail.com; r-help@r-project.org Onderwerp: Re: [R] Replacing matching values by related values On Sep 18, 2011, at 3:56 AM, Janssen, K.J.M. wrote: Thanks Michael. I tested it and it works for numeric values, but not for the 'text' values that I am comparing, thus comparing a with a,b, etc. Any advice how I can solve it? Solve what? You never posted full working code and an explicit example. Unless there were actually objects named a, b, c, etc. in your workspace then the code that started out: v - c(f,a,e,d,m, would not have been meaningful except to hint at the possibility that you might be comparing character vectors. I assumed that d[,2] was actually letters[1:17] rather than what you wrote. It's especially important to indicate whehte ryou have attached any objects. Post dput(head(d)) and dput(v) for the example part and include any code use to construct them. -- david. Thanks! -Oorspronkelijk bericht- Van: R. Michael Weylandt michael.weyla...@gmail.com [mailto: michael.weyla...@gmail.com ] Verzonden: zo 18-9-2011 2:27 Aan: Janssen, K.J.M. CC: r-help@r-project.org Onderwerp: Re: [R] Replacing matching values by related values Try playing with match(). Something like d[match(v,d[,1]),2] Should work (untested bc I'm writing from my phone though) Michael Weylandt On Sep 17, 2011, at 4:33 PM, Janssen, K.J.M. k.j.m.jans...@umcutrecht.nl wrote: I am trying to replace values of a vector (consisting of 15 values) by a value that is related to a matching value in a dataset (consisting of 17 rows). Here's an example The vector: v - c(f,a,e,d,m,o,e,f,i,n,e,i,b,a,o) The dataset's columns consist of the following values d[,1] - c(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q) d[,2] - 1:17 So I want to end up with a vector that consists of the values of the second colomn, when the value of the vector matches the value of the first colomn. Thus, I aim to end up with a vector with the following values c(6,1,5,4,13,15,5,6,9,14,5,9,2,1,15) Help is appreciated! -- De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. Het Universitair Medisch Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de W.H.W. (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd bij de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197. Denk s.v.p aan het milieu voor u deze e-mail afdrukt. -- This message may contain confidential information and is... {{dropped:12}}
Re: [R] R shell line width
you want options(width= ) you can edit your .Rprofile file and the .First function in there to set it when you start R or in the console interactively On Fri, Sep 16, 2011 at 12:48 PM, Mike P mike.polya...@gmail.com wrote: Hi, I want to apologize in advance if this has already been asked. I wasn't able to find any information, either on google or from local list search. I'm running an R shell from a linux command line, in an xterm window. Whenever I print a data frame, only the first couple of columns are printed side-by-side, the others are being repositioned below them. It seems something is limiting the line width of the output, even though there is enough horizontal space to fit each row on a single line. For example, this command: data.frame(matrix(1:30,nrow=1)) prints columns 1-21 on the first line, and the rest 22-30 on the second. Is there a way I can configure R to increase the width of my output? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] map
i responded offline the first time, but: google is your friend: search for R maps and you'll find what I mention below. In the future make sure to perform a thorough search of google and the help forums before you post That said... you're looking for the maps package install.packages('maps') map('italy') ggplot2 package has a function called map_data that extracts the lines if you want the actual data, see the example hadley provided ?ggplot2::map_data hope that helps, Justin On Tue, Sep 13, 2011 at 8:48 AM, Batur swordligh...@gmail.com wrote: Adding to the previous question, I would like to map central Asia along with those five countries (Kazakhstan, Kyrgyzstan, Uzbekstan, Tajikstan and Turkmenstan). Please tell us the right data base!!! Thanks a lot!!! -- View this message in context: http://r.789695.n4.nabble.com/map-tp3810363p3810421.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshaping data
look at the melt function in reshape, specifically ?melt.data.frame require(reshape) Raw.melt-melt(RawData,id.vars='Year',variable_name='Month') there is an additional feature in the melt function for handling na values. names(Raw.melt)[3]-'CO2' head(Raw.melt) Year MonthCO2 1 1958 J NA 2 1959 J 315.58 3 1960 J 316.43 4 1961 J 316.89 5 1962 J 317.94 6 1963 J 318.74 you can order your data.frame if you'd like Raw.melt-Raw.melt[order(Raw.melt$Year,Raw.melt$Month),] head(Raw.melt) Year MonthCO2 1 1958 J NA 48 1958 F NA 95 1958 M 315.71 142 1958 A 317.45 189 1958 M.1 317.50 236 1958 J.1 NA On Wed, Sep 7, 2011 at 7:35 AM, B77S bps0...@auburn.edu wrote: I have the following data (see RawData using dput below) How do I get it in the following 3 column format (CO2 measurements are the elements of the original data frame). I'm sure the package reshape is where I should look, but I haven't figured out how. Thanks ahead of time Month Year CO2 J 1958 F 1958 M 1958315.71 A 1958317.45 M.1 1958317.5 J.1 1958 J.2 1958315.86 A.1 1958314.93 S 1958313.19 O 1958 N 1958313.34 D 1958314.67 J 1959315.58 F 1959316.47 # here is the data RawData - structure(list(Year = c(1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004), J = c(NA, 315.58, 316.43, 316.89, 317.94, 318.74, 319.57, 319.44, 320.62, 322.33, 322.57, 324, 325.06, 326.17, 326.77, 328.54, 329.35, 330.4, 331.74, 332.92, 334.97, 336.23, 338.01, 339.23, 340.75, 341.37, 343.7, 344.97, 346.29, 348.02, 350.43, 352.76, 353.66, 354.72, 355.98, 356.7, 358.36, 359.96, 362.05, 363.18, 365.32, 368.15, 369.14, 370.28, 372.43, 374.68, 376.79), F = c(NA, 316.47, 316.97, 317.7, 318.56, 319.08, NA, 320.44, 321.59, 322.5, 323.15, 324.42, 325.98, 326.68, 327.63, 329.56, 330.71, 331.41, 332.56, 333.42, 335.39, 336.76, 338.36, 340.47, 341.61, 342.52, 344.51, 346, 346.96, 348.47, 351.72, 353.07, 354.7, 355.75, 356.72, 357.16, 358.91, 361, 363.25, 364, 366.15, 368.87, 369.46, 371.5, 373.09, 375.63, 377.37), M = c(315.71, 316.65, 317.58, 318.54, 319.69, 319.86, NA, 320.89, 322.39, 323.04, 323.89, 325.64, 326.93, 327.18, 327.75, 330.3, 331.48, 332.04, 333.5, 334.7, 336.64, 337.96, 340.08, 341.38, 342.7, 343.1, 345.28, 347.43, 347.86, 349.42, 352.22, 353.68, 355.39, 357.16, 357.81, 358.38, 359.97, 361.64, 364.03, 364.57, 367.31, 369.59, 370.52, 372.12, 373.52, 376.11, 378.41 ), A = c(317.45, 317.71, 319.03, 319.48, 320.58, 321.39, NA, 322.13, 323.7, 324.42, 325.02, 326.66, 328.13, 327.78, 329.72, 331.5, 332.65, 333.31, 334.58, 336.07, 337.76, 338.89, 340.77, 342.51, 343.56, 344.94, 347.08, 348.35, 349.55, 350.99, 353.59, 355.42, 356.2, 358.6, 359.15, 359.46, 361.26, 363.45, 364.72, 366.35, 368.61, 371.14, 371.66, 372.87, 374.86, 377.65, 380.52 ), M.1 = c(317.5, 318.29, 320.03, 320.58, 321.01, 322.24, 322.23, 322.16, 324.07, 325, 325.57, 327.38, 328.07, 328.92, 330.07, 332.48, 333.09, 333.96, 334.87, 336.74, 338.01, 339.47, 341.46, 342.91, 344.13, 345.75, 347.43, 348.93, 350.21, 351.84, 354.22, 355.67, 357.16, 359.34, 359.66, 360.28, 361.68, 363.79, 365.41, 366.79, 369.29, 371, 371.82, 374.02, 375.55, 378.35, 380.63), J.1 = c(NA, 318.16, 319.59, 319.78, 320.61, 321.47, 321.89, 321.87, 323.75, 324.09, 325.36, 326.7, 327.66, 328.57, 329.09, 332.07, 332.25, 333.59, 334.34, 336.27, 337.89, 339.29, 341.17, 342.25, 343.35, 345.32, 346.79, 348.25, 349.54, 351.25, 353.79, 355.13, 356.22, 358.24, 359.25, 359.6, 360.95, 363.26, 364.97, 365.62, 368.87, 370.35, 371.7, 373.3, 375.4, 378.13, 379.57 ), J.2 = c(315.86, 316.55, 318.18, 318.58, 319.61, 319.74, 320.44, 321.21, 322.4, 322.55, 324.14, 325.89, 326.35, 327.37, 328.05, 330.87, 331.18, 331.91, 333.05, 334.93, 336.54, 337.73, 339.56, 340.49, 342.06, 343.99, 345.4, 346.56, 347.94, 349.52, 352.39, 353.9, 354.82, 356.17, 357.03, 357.57, 359.55, 361.9, 363.65, 364.47, 367.64, 369.27, 370.12, 371.62, 374.02, 376.62, 377.79), A.1 = c(314.93, 314.8, 315.91, 316.79, 317.4, 317.77, 318.7, 318.87, 320.37, 320.92, 322.11, 323.67, 324.69, 325.43, 326.32, 329.31, 329.4, 330.06, 330.94, 332.75, 334.68, 336.09, 337.6, 338.43, 339.82, 342.39, 343.28, 344.69, 345.91, 348.1, 350.44, 351.67, 352.91, 354.03, 355, 355.52, 357.49, 359.46, 361.49, 362.51, 365.77, 366.94, 368.12, 369.55, 371.49, 374.5, 375.86), S = c(313.19, 313.84, 314.16, 314.99, 316.26, 316.21, 316.7, 317.81, 318.64, 319.26, 320.33, 322.38, 323.1, 323.36, 324.84, 327.51, 327.44, 328.56, 329.3, 331.58, 332.76, 333.91, 335.88, 336.69,
Re: [R] Fitting my data to a Weibull model
This is what I use... fit.func-function(x){ require(MASS) est-fitdistr(x$wind_speed, 'weibull')$estimate data.frame(shape=est[1],scale=est[2]) } feel free to correct me if this is wrong! Justin On Wed, Aug 31, 2011 at 6:21 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: Things work if x is the response and y is the covariate. To use the approach I describe below, you need RStudio and its manipulate package (which is only available in RStudio - you won't find it on CRAN). You can download and install RStudio freely from http://rstudio.org/ ; it is available for Windows, Linux and Mac. To quote an old TV commercial line in the US: 'Try it, you'll like it' :) In the script below, the covariate has to be named x since the script calls the curve() function, which plots a mathematical function of a single variable named x. As a result, you need to interchange the names of your vectors. Within RStudio, copy and paste the following in chunks; in particular, copy and paste the code starting with 'manipulate(' and ending in ')' to generate the sliders for the parameter estimates. The idea is to tweak the parameter values until you get a fitted model that fits the observed data fairly closely. When you achieve that, kill the slider box (upper right corner); the estimates at the state where the sliders are closed are then saved in a vector called start, which you use in the subsequent nls() call. After the model is fit, a sequence of x values is generated as new data, the predicted values at those points are computed, and a plot of the observed data with overlaid fitted model is produced. You have to be a bit careful; occasionally, you'll get an error Error in nls(y ~ a - b * exp(-c * x^d), start = start) : singular gradient If so, just try again with a different set of initial values, trying not to overdo it. You don't need to be exact, just close. library('manipulate') ### Weibull model: x - c(1,2,3,4,10,20) y - c(1,7,14,25,29,30) ## Copy and paste the code chunk below into RStudio, ## stopping with the line of hash marks start - list() # Generate sliders to find good initial parameter estimates manipulate( { plot(y ~ x) a - a0; b - b0; c - c0; d - d0 curve(a-b*exp(-c*x^d), add=TRUE) start - list(a=a, b=b, c=c, d=d) }, a0 = slider(10, 50, step=0.1, initial = 30), b0 = slider(0, 100, step=1, initial = 3), c0 = slider(0, 0.1, step=0.01, initial = 0.01), d0 = slider(0, 10, step=0.1, initial = 5) ) ## Stop here ## # Fit the model using the estimates from the sliders weibm - nls(y ~ a-b*exp(-c*x^d), start = start) summary(weibm) # Make predictions over a sequence of x values and plot ndata - data.frame(x = seq(0, 20, by = 0.1)) wpred - predict(weibm, newdata = ndata) plot(y ~ x, pch = 16) lines(ndata$x, wpred, col = 'red') ### Logistic: start - list() manipulate( { plot(y ~ x) a - a0; b - b0; d - d0 curve(a/(1+b*exp(-d*x)), add=TRUE) start - list(a=a, b=b, d=d) }, a0 = slider(0, 50, step = 1, initial = 30), b0 = slider(0, 20, step = 0.1, initial = 10), d0 = slider(0, 1, step = 0.01, initial = 0.1) ) logism - nls(y ~ a/(1+b*exp(-d*x)), start = start) summary(logism) ldata - data.frame(x = seq(0, 20, by = 0.1)) lpred - predict(weibm, newdata = ndata) plot(y ~ x, pch = 16) lines(ldata$x, lpred, col = 'red') This is a good exercise to learn how the various parameters affect the shape of the curve associated with a particular nonlinear model in one variable. It also helps to read about the model in question and understand the interpretation associated with each of the parameters. That way, you can use the sliders to visualize the effects of changes in one parameter when the others are held constant. If you find that the boundaries of the sliders are too restrictive, you can always reset them and try again. The code above came about from a few iterations of tweaking ranges for individual parameters (either wider or narrower as the case may be). I always keep the code in an editor so that it's easy to change, then copy and paste into the R console. If you redo the slider fitting, it's easier to reset the start vector, too. You'll also notice that one parameter in each of the fitted models is nonsignificant, but you need to take into account that you're fitting models with three or four parameters to six data points. Aside: If you really meant to use y and x as response and covariate, respectively, in your posted data example, the sliders will show you that the two models are way off the mark, since y would start out slowly and then jump exponentially. That would require a completely different nonlinear model. You'll also notice that the estimates of a and b in the Weibull model are an order
[R] lubridate and intervals
Hiya, maybe there is a native R function for this and if so please let me know! I have 2 data.frames with start and end dates, they read in as strings and I am converting to POSIXct. How can I check for overlap? The end result ideally will be a single data.frame containing all the columns of the other two with rows where there were date overlaps. df1-data.frame(start=as.POSIXct(paste('2011-06-01 ',1:20,':00',sep='')), end=as.POSIXct(paste('2011-06-01 ',1:20,':30',sep=''))) df2-data.frame(start=as.POSIXct(paste('2011-06-01 ',rep(seq(1,20,2),2),':',sample(1:19,20,replace=T),sep='')), end=as.POSIXct(paste('2011-06-01 ',rep(seq(1,20,2),2),':',sample(20:50,20),sep=''))) I tried: library(lubridate) df1$interval-new_interval(df1$start,df1$end) df1$interval[1] [1] 2011-06-01 01:00:00 -- 2011-06-01 01:30:00 df2$start[1] [1] 2011-06-01 01:17:00 PDT but df2$start[1] %in% df1$interval[1] [1] FALSE This must be fairly straight forward and I just don't know where to look! Thanks, Justin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to referee a dimension name via a variable?
try: newnam-paste('newdatadat',dayno,sep='') plot(test[[newnam[1]]]) On Mon, Aug 29, 2011 at 12:29 PM, Jie TANG totang...@gmail.com wrote: hi, R-users I have a data.frame for example test$newdataday24 and test$newdataday48 I can plot them by plot(test$newdataday24) but now i want to plot different data by define a variable to describe them dayno-c(24,48) newnam-paste(test$newdataday,dayno,sep=) plot(newnam[1]) but i failed,the error message said that something wrong with plot.window what can i do to fix my script ? thanks - TANG Jie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] debugging functions in R
Another great tool is debugonce() wrap your function name in it and then execute your function call. debugonce(my.function) out-my.function(df) And you'll be brought into the same interactive browser. (its Vi if im not mistaken which can take a little getting used to.) Justin On Wed, Aug 24, 2011 at 7:29 AM, Liviu Andronic landronim...@gmail.comwrote: On Wed, Aug 24, 2011 at 4:20 PM, Eran Eidinger e...@taykey.com wrote: Hi, I am not sure if this is the right list to ask this question (though I did not find a more appropriate one). I've started using R a month ago, and small scripts work fine. However, when I start writing more complex code, it gets messy. 1. Is there any way to debug normally, with breakpoints? fortune('browser') My solution when I run into mysteries like this is to put 'browser()' in the function just before or after the line of interest. The magnitude and direction of my stupidity usually become clear quickly. -- Patrick Burns R-help (February 2006) Use browser() to inspect the environment and execute the code one step at a time. Liviu 2. I am using the Eclipse plugin (StatET), and tried JGR(). Is there an IDE that enables breakpoints? 3. Is there an equivalent to include in other programming languages? So many functions in one file are very messy. I would like to break it to several files. 4. Any way to create a local context of variables inside a function? Otherwise I have to be careful to give different names inside functions, to those in the workspace. I should point that I am a long time Matlab user and am probably expecting some things that don't necessarily exist in R... I know it's a lot, if there is a more appropriate forum to ask these, please point me in that direction. Thanks, Eran. * * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.numeric() and POSIXct format
as.POSIXct(518400,origin='2001-01-01') [1] 2001-01-07 PST as.POSIXct(as.numeric(as.POSIXct(518400,origin='2001-01-01')),origin='1970-01-01') [1] 2001-01-07 08:00:00 PST On Wed, Aug 24, 2011 at 9:22 AM, Agustin Lobo agustin.l...@ija.csic.eswrote: Hi! I'm confused by this: as.numeric(as.POSIXct(518400,**origin=2001-01-01)) [1] 978822000 I guess the problem is that as.numeric() assumes a different origin, but cannot find any default origin. How can I get back the seconds from the POSIXct format? In other words, which the inverse function of as.POSIXct()? I've tried as.numeric and unclass() using a origin= argument, but this does not work. Thanks Agus -- Dr. Agustin Lobo Institut de Ciencies de la Terra Jaume Almera (CSIC) LLuis Sole Sabaris s/n 08028 Barcelona Spain Tel. 34 934095410 Fax. 34 934110012 email: agustin.l...@ija.csic.es __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting a list of matrices
His is better, but you can also use a for loop... out-data.frame(rows=1:3) for(i in 1:3){ if(l[[i]][3]=='Message 1') { out$V1[i]-l[[i]][1] } else { out$V1[i]-NA } } but shouldn't if your list is very long On Tue, Aug 23, 2011 at 9:35 AM, Henrique Dallazuanna www...@gmail.comwrote: Try this: subset(as.data.frame(do.call(rbind, lapply(l, [, , 1))), row3 == Message 1) On Tue, Aug 23, 2011 at 1:28 PM, Lara Poplarski larapoplar...@gmail.com wrote: Hi all, I have an object that looks (roughly) like the following: l - list(a = matrix(rnorm(9), 3), b = matrix(rnorm(9), 3), c = matrix(rnorm(9), 3)) l$a[3,] - sample(c(Message 1, Message 2, Message 3)) l$b[3,] - sample(c(Message 1, Message 2, Message 3)) l$c[3,] - sample(c(Message 1, Message 2, Message 3)) rownames(l$a) - rownames(c(1:3), do.NULL = FALSE, prefix = row) rownames(l$b) - rownames(c(1:3), do.NULL = FALSE, prefix = row) rownames(l$c) - rownames(c(1:3), do.NULL = FALSE, prefix = row) colnames(l$a) - c(V1, V2, V3) colnames(l$b) - c(V1, V2, V3) colnames(l$c) - c(V1, V2, V3) I want to extract values (row1, V1) for the three sublists a, b, c, but only for those cases in which row3 == Message 1. Could someone suggest how to proceed? Many thanks in advance, Lara __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ddply - how to transform df column in place
Jean, Ista is right, but: In your function you are asking as.Date to convert the whole data.frame df rather than just your daterep column. out-ddply(d2, .(daterep), function(df) as.Date(strptime(df$daterep,format='%Y%m%d'))) str(out) 'data.frame':30 obs. of 2 variables: $ daterep: num 20100801 20100802 20100803 20100804 20100805 ... $ V1 : Date, format: 2010-08-01 2010-08-02 2010-08-03 2010-08-04 ... On Tue, Aug 23, 2011 at 3:16 PM, jjap jean.plamon...@fpinnovations.cawrote: Dear R-users, I am trying to get the plyr syntax right, without much success. Given: d- data.frame(cbind(x=1,y=seq(20100801,20100830,1))) names(d)-c(first, daterep) d2-d # I can convert the daterep column in place the classic way: d$daterep-as.Date(strptime(d$daterep, format=%Y%m%d)) # How to do it the plyr way? ddply(d2, c(daterep), function(df){as.Date(df, format=%Y%m%d)}) # returns: Error in as.Date.default(df, format = %Y%m%d) : # do not know how to convert 'df' to class Date Thanks for any hints, ---jean -- View this message in context: http://r.789695.n4.nabble.com/ddply-how-to-transform-df-column-in-place-tp3764037p3764037.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help: Sort components of a vector with indices tracked in R
If you make your vector a data.frame, you will have row numbers accompanying your sorting df-data.frame(V1=c(1,4,3,2)) df$rows-row.names(df) df[order(df$V1),] also, you shouldn't use c as a variable name since its an important R function... see your example :) Justin On Tue, Aug 23, 2011 at 4:59 PM, Chee Chen chee.c...@yahoo.com wrote: Dear All, I would like to know how to sort a vector of numeric values such that we know the original index of each ordered component. Say, we have c - c(1,4,3,2) csort - sort(c,descreasing=FALSE) With a few components of c, we can manually find out: csort[1] = 1 = c[1], ie, the original index of csort[1] is 1, csort[2] =2 =c[4], ie, the original index of csort[2] is 4. When length(c) is very large, manual checking is infeasible. We can set up a for loop to compare and extract the index. However, is there an easier way to do this, so that the output is the sorted vector and their corresponding original indices. Thanks Chee [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot in a function confusion!
Whats going on here? df-data.frame(x=1:10,y=1:10) ggplot()+geom_point(data=df,aes(x=x,y=y)) ## this is the normal usage right? ggplot()+geom_point(data=df,aes(x=df[,1],y=df[,2])) ## but I can also feed it column indices ggplot()+geom_point(aes(x=df[,'x'],y=df[,'y'])) ## or column names. ## but if i wrap it in a function... plot.func.one-function(dff,x.var,y.var){ print(ggplot() + geom_point(aes(x=dff[,x.var],y=dff[,y.var]))) } plot.func.two-function(dff,x.var,y.var){ print(ggplot() + geom_point(data=dff,aes(x=dff[,x.var],y=dff[,y.var]))) } plot.func.three-function(dff,x.var,y.var){ print(ggplot() + geom_point(data=dff,aes(x=eval(x.var),y=eval(y.var } plot.func.one(df,1,2) ## i assume the dff not found error is happening in the aes call rather than the data= portion.. plot.func.one(df,'x','y') ## but why does it work in the global env and not within a function? plot.func.two(df,1,2) plot.func.two(df,'x','y') var.x-'x' var.y-'y' plot.func.three(df,var.x,var.y) ## why does it give the error on y.var instead of x.var? plot.func.three(df,'x','y') dff-df x.var-var.x y.var-var.y plot.func.one(dff,x.var,y.var) ## now whats going on? I assume this works because ggplot is looking globally rather than within the function... plot.func.two(dff,x.var,y.var) plot.func.three(dff,x.var,y.var) nothing seems to work right! How do I plot within a function where I can feed the function a data.frame and the columns I want plotted? I assume this is some interesting name space issue but if you guys can enlighten me as to what's going on... Thanks, Justin P.S. So before I sent this I dug some more and found my answer, aes_string: plot.func-function(dff,x.var,y.var){ print(ggplot() + geom_point(data=dff,aes_string(x=x.var,y=y.var))) } plot.func(df,'x','y') works great. But I still wouldn't mind some clarification on what's happening in my earlier examples. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sequential Naming of ggplot .pngs using plyr
If I have data: dat-data.frame(a=rnorm(20),b=rnorm(20),c=rnorm(20),d=rnorm(20),site=rep(letters[5:8],each=5)) And want to plot like this: ctr-1 for(i in c('a','b','c','d')){ png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5, width=11,units='in',pointsize=9,res=300) print(ggplot(dat[,names(dat) %in% c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot number',ctr,sep=' '))) dev.off() ctr-ctr+1 } Is there a way to do the same naming using plyr (or data.table or foreach which I am not familiar with at all!)? m.dat-melt(dat,id.vars='site') ddply(m.dat,.(variable),function(df) print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()+ ..?) And better yet, is there a way to do it using .parallel=T? Faceting is not really an option (unless I can facet onto multiple pages of a pdf or something) because these need to go into reports as individually labelled and titled plots. As a bit of a corollary, is it really worth the headache to resolve this if I am only using melt/plyr to split on the four letter variables? With a larger set of data (1e6 rows), the melt/plyr version takes a significant amount of time but .parallel=T drops the time significantly. Is the right answer a foreach loop and can I do that with the increasing counter? (I haven't gotten beyond Hadley's .parallel feature in my parallel R dealings.) dat-data.frame(a=rnorm(1e6),b=rnorm(1e6),c=rnorm(1e6),d=rnorm(1e6),site=rep(letters[5:8],each=2.5e5)) ctr-1 system.time(for(i in c('a','b','c','d')){ + png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5, width=11,units='in',pointsize=9,res=300) + print(ggplot(dat[,names(dat) %in% c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot number',ctr,sep=' '))) + dev.off() + ctr-ctr+1 + }) user system elapsed 54.630 0.120 54.843 system.time( + ddply(melt(dat,id.vars='site'),.(variable),function(df) { + png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300) + print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()) + dev.off() + },.parallel=F) + ) user system elapsed 58.400.13 58.63 system.time( + ddply(melt(dat,id.vars='site'),.(variable),function(df) { + png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300) + print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()) + dev.off() + },.parallel=T) + ) user system elapsed 70.333.46 27.61 How might I speed this up and include the sequential plot names? Thanks a bunch! Justin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sequential Naming of ggplot .pngs using plyr
Thanks Ista, In my real code that is exactly what I'm doing, but I want to prepend the names with a sequential number for easier reference once the pngs are made. My initial thought was to add the sequential number to the data before sending it to plyr and drawing it out there, but that seems like an excessive extra step when I have 1e6 - 1e7 rows. Justin On Wed, Aug 10, 2011 at 2:42 PM, Ista Zahn iz...@psych.rochester.eduwrote: Hi Justin, On Wed, Aug 10, 2011 at 5:04 PM, Justin Haynes jto...@gmail.com wrote: If I have data: dat-data.frame(a=rnorm(20),b=rnorm(20),c=rnorm(20),d=rnorm(20),site=rep(letters[5:8],each=5)) And want to plot like this: ctr-1 for(i in c('a','b','c','d')){ png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5, width=11,units='in',pointsize=9,res=300) print(ggplot(dat[,names(dat) %in% c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot number',ctr,sep=' '))) dev.off() ctr-ctr+1 } Is there a way to do the same naming using plyr (or data.table or foreach which I am not familiar with at all!)? This is not the same naming, but the same general idea can be achieved with plyr using d_ply(melt(dat,id.vars='site'),.(variable),function(df) { png(file=paste(plyr_plot, unique(df$variable), .png),height=8.5,width=11,units='in',pointsize=9,res=300) print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()) dev.off() }) I'm not up to speed on .parallel, foreach etc., so I'l leave the rest to someone else. Best, Ista m.dat-melt(dat,id.vars='site') ddply(m.dat,.(variable),function(df) print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()+ ..?) And better yet, is there a way to do it using .parallel=T? Faceting is not really an option (unless I can facet onto multiple pages of a pdf or something) because these need to go into reports as individually labelled and titled plots. As a bit of a corollary, is it really worth the headache to resolve this if I am only using melt/plyr to split on the four letter variables? With a larger set of data (1e6 rows), the melt/plyr version takes a significant amount of time but .parallel=T drops the time significantly. Is the right answer a foreach loop and can I do that with the increasing counter? (I haven't gotten beyond Hadley's .parallel feature in my parallel R dealings.) dat-data.frame(a=rnorm(1e6),b=rnorm(1e6),c=rnorm(1e6),d=rnorm(1e6),site=rep(letters[5:8],each=2.5e5)) ctr-1 system.time(for(i in c('a','b','c','d')){ + png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5, width=11,units='in',pointsize=9,res=300) + print(ggplot(dat[,names(dat) %in% c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot number',ctr,sep=' '))) + dev.off() + ctr-ctr+1 + }) user system elapsed 54.630 0.120 54.843 system.time( + ddply(melt(dat,id.vars='site'),.(variable),function(df) { + png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300) + print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()) + dev.off() + },.parallel=F) + ) user system elapsed 58.400.13 58.63 system.time( + ddply(melt(dat,id.vars='site'),.(variable),function(df) { + png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300) + print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()) + dev.off() + },.parallel=T) + ) user system elapsed 70.333.46 27.61 How might I speed this up and include the sequential plot names? Thanks a bunch! Justin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] binary conversion list to data.frame with plyr... AND NO LOOPS!
Happy weekend helpeRs! As usual, I'm stumped by R... My plan was to take an integer number, convert it to binary and wind up with a data.frame where each column is either 1 or 0 so I can see which bits are changing: bb-function(i) ifelse(i, paste(bb(i %/% 2), i %% 2, sep=), ) my.dat-c(36,40,10,4) my.binary.dat-bb(my.dat) my.list-strsplit(my.binary.dat,'') max.len-max(ldply(my.list,length)) len-length(my.list) my.df-data.frame(two=rep(0,len),four=rep(0,len),eight=rep(0,len),sixteen=rep(0,len),thirtytwo=rep(0,len),sixtyfour=rep(0,len)) for(i in 1:length(my.list)){ for(j in 1:length(my.list[[i]])){ my.df[i,max.len-length(my.list[[i]])+j]-my.list[[i]][j] } } But this isn't exactly feasable on a million+ rows where some binary numbers are 20 digits... I know theres a way without loops I just know it! Ideally, I can do this to multiple columns of a data.frame and have them named accordingly (V1.two,V1.four... V2.two,V2.four, etc.) Thanks, Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rle with NA values?
Happy Friday! Using this function: fixSeq - function(df) { shift1 - function(x) c(1, x[-length(x)]) df$state_shift-df$state df.rle-rle(df$state_shift) repeat { shifted.sf-shift1(df.rle$values) change - df.rle$values = 4 shifted.sf = 4 shifted.sf != df.rle$values if(any(change)) df.rle$values[change] - shifted.sf[change] else break } gc() df$state_shift-inverse.rle(df.rle) return(df) } I would like to separate runs where the removed NAs will separate runs into two separate runs. to illustrate with a short example: dat-data.frame(id=1,state=c(1,2,4,4,5,NA,5,5,1)) fixSeq(dat) Error in df.rle$values[change] - shifted.sf[change] : NAs are not allowed in subscripted assignments fixSeq(na.omit(dat)) id state state_shift 1 1 1 1 2 1 2 2 3 1 4 4 4 1 4 4 5 1 5 4 7 1 5 4 8 1 5 4 9 1 1 1 rather than the true output of 1 2 4 4 4 5 5 1. The NA makes the second pair of 5s a unique state rather than a continuation of the previous state 4. Is this best accomplished by assigning NA to a value like -99? or do I have other options? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rle on large data . . . without a for loop!
I think need to do something like this: dat-data.frame(state=sample(id=rep(1:5,each=200),1:3, 1000, replace=T,prob=c(0.7,0.05,0.25)),V1=runif(1,10,1000),V2=rnorm(1000)) rle.dat-rle(dat$state) temp-1 out-data.frame(id=1:length(rle.dat$length)) for(i in 1:length(rle.dat$length)){ temp2-temp+rle.dat$length[[i]] out$V1[i]-mean(dat$V1[temp:temp2]) out$V2[i]-sum(dat$V2[temp:temp2]) out$state[i]-rle.dat$value[[i]] temp-temp2 } to a very large dataset. I want to apply a few summary functions to some variables within a data.frame for given states. to complicate things, id like to use plyr and split on the id variable before i do any of this... loop.func-function(dat){ rle.dat-rle(dat$state) temp-1 out-data.frame(id=1:length(rle.dat$length)) for(i in 1:length(rle.dat$length)){ temp2-temp+rle.dat$length[[i]] out$V1[i]-mean(dat$V1[temp:temp2]) out$V2[i]-sum(dat$V2[temp:temp2]) out$state[i]-rle.dat$value[[i]] temp-temp2 } return(out) } out-ddply(dat,.(id),loop.func) mostly, i just don't understand how to use a list (especially in this instance) in a plyr/apply statement... Thanks, Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gridExtra with cairodevie and ggplots
I apologise in advance for not providing code, but this seems like a straight forward question... I am making a few full page plots some of which are portrait and some of which are landscape I would like to open my cairo device once and put all the plots in the same .pdf. But since some need to be rotated to fit the cairo device dimensions, is there a simple parameter to arrangeGrob (im using grid.arrange to generate the final plot) that will rotate the entire output 90 degrees so all my pages can be the same direction? Thanks, Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gridExtra with cairodevie and ggplots
Thats perfect, thank you! On Tue, Jun 14, 2011 at 2:10 PM, baptiste auguie baptiste.aug...@googlemail.com wrote: Hi, You can draw arrangeGrob in a rotated viewport, library(gridExtra) library(ggplot2) ps = replicate(4, qplot(rnorm(10), rnorm(10)), simplify=F) g = gTree(children=gList(do.call(arrangeGrob, ps)), vp=viewport(angle=90)) grid.draw(g) though you get some warnings about clipping for some reason. Perhaps more cleanly, you can define a print.arrange method, (shamelessly borrowed from ggplot2), print.arrange = function (x, newpage = is.null(vp), vp = NULL, ...) { if (newpage) grid.newpage() if (is.null(vp)) { grid.draw(x) } else { if (is.character(vp)) seekViewport(vp) else pushViewport(vp) grid.draw(x) upViewport() } } print(do.call(arrangeGrob, ps), vp=viewport(angle=90)) HTH, baptiste On 15 June 2011 08:39, Justin Haynes jto...@gmail.com wrote: I apologise in advance for not providing code, but this seems like a straight forward question... I am making a few full page plots some of which are portrait and some of which are landscape I would like to open my cairo device once and put all the plots in the same .pdf. But since some need to be rotated to fit the cairo device dimensions, is there a simple parameter to arrangeGrob (im using grid.arrange to generate the final plot) that will rotate the entire output 90 degrees so all my pages can be the same direction? Thanks, Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ragged data.frame? using plyr
I have a dataset that looks like: set.seed(144) sam-sample(1000,100) dat-data.frame(id=letters[1:10],value=rnorm(1000),day=c(rep(1,100),rep(2,100),rep(3,100),rep(4,100),rep(5,100))) I want to normalise it using the following function (unless you have a better idea...): adj.values-function(dframe){ value_mean-mean(dframe$value) value_sd-sd(dframe$value) norm_value-(dframe$value-value_mean)/value_sd score_scale-100 score_offset-1000 scaled_value-norm_value*score_scale+score_offset names(scaled_value)-dframe$id return(scaled_value) } score_out-ddply(dat,.(day),adj.values) Gives me my data.frame all nice and pretty and ready to do the following: score_out.melt-melt(score_out,id='day') names(score_out.melt)-c('day','id','score') tblscore_mean-tapply(score_out.melt$score,INDEX=score_out.melt$id,mean) tblscore_iqr-tapply(score_out.melt$score,INDEX=score_out.melt$id,IQR) score_mean_iqr-data.frame(id=names(tblscore_iqr),mean=tblscore_mean,iqr=tblscore_iqr) However, as it turns out, my data look more like: dat-dat[-sam] ldply(dlply(dat,.(id,day),adj.values),length) So on different days I only have data for some of the id variables which leads to a ragged data.frame. ddply(dat,.(id,day),adj.values) can i do something like ldply(dlply(dat,.(id.day),adj.values), function(x){put in a NA for the places where data is missing?}) To give you a sense of where this is going, I'm eventually going to plot the mean of each id variable over the time period vs. its IQR (again unless you have a better idea...). As always, thanks for your help! Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] count value changes in a column
is there a way to look for value changes in a column? set.seed(144) df-data.frame(state=sample(rep(1:5,200),1000)) any of the five states are acceptable. however if, for example, states 4 or 5 follow state 3, i want to overwrite them with 3. changes from 1 to any value and 2 to any value are acceptable as are changes from any value to 1 or 2. By way of an example: the sequence 1 3 3 5 5 3 2 4 2 1 5 3 3 5 should read 1 3 3 3 3 3 2 4 2 1 5 5 5 5 Thanks for the help! Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count value changes in a column
I apologize for the confusion but that solution will work with a twist. I want to record only the first value of a state change that goes above 2. so if the sequence is 344455544334 it should read all 3s but 3442555414433 should read 33321 Hope that helps clarify, if not I can get there from your function Bill, Thanks! Justin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot geom_boxplot vertical margins
If you plot: df-data.frame(x=factor(1:100),y=rnorm(1000)) ggplot(df,aes(x=x,y=y))+geom_boxplot() How do I remove those pesky margins on the sides of the plot area? Or maybe just reduce their size to something more like the spacing of the boxes? Thanks, Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot geom_boxplot vertical margins
Exactly! Thanks, I couldn't find that anywhere! On Wed, May 18, 2011 at 1:59 PM, Felipe Carrillo mazatlanmex...@yahoo.com wrote: Is this what you want? You can control how much space you want to see on the sides of the plot: df-data.frame(x=factor(1:100),y=rnorm(1000)) ggplot(df,aes(x=x,y=y))+geom_boxplot() + scale_x_discrete(expand=c(0,0)) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA http://www.fws.gov/redbluff/rbdd_jsmp.aspx - Original Message From: Justin Haynes jto...@gmail.com To: r-help@r-project.org Sent: Wed, May 18, 2011 1:51:19 PM Subject: [R] ggplot geom_boxplot vertical margins If you plot: df-data.frame(x=factor(1:100),y=rnorm(1000)) ggplot(df,aes(x=x,y=y))+geom_boxplot() How do I remove those pesky margins on the sides of the plot area? Or maybe just reduce their size to something more like the spacing of the boxes? Thanks, Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I break my addiction to for loops!?!?
I know I'm not supposed to use them... but they're just so easy! I have trouble defining an appropriate function for plyr or apply! data-rnorm(144) groups1-c('a','b','c','d') groups2-c('aa','bb','cc','dd') machines-1:12 df-data.frame(machine=machines,group1=groups1,group2=groups2,U=data,V=2*data,W=data^2,X=1/data,Y=data+2,Z=2/data) So... I am currently generating a table and a geom_boxplot and squish em together with gridExtra. But, for columns U,V and W I want to use group1 as my split variable and columns X, Y and Z I will use group2. I also need to make it as flexible as possible. What I've got now is... box.vars-match(c('U','V','W'),colnames(df)) index.group-match('group1',colnames(df)) group.types-unique(df[,index.group]) for(j in 1:length(group.types)){ for(i in 1:length(box.vars)){ index.rows-which(df[,index.group]==group.types[j] df[,box.vars[i]]!=0) p-ggplot(data=df,aes(x=factor(df$machine[index.rows]),y=df[index.rows,box.vars[i]])) p-p+geom_boxplot()+labs(x='Machine ID',y=names(df[box.vars[i]])) p-p+opts(axis.text.x=theme_text(angle=50,size=7)) mins-round(tapply(df[index.rows,box.vars[i]],df$machine[index.rows],min),digits=3) maxes-round(tapply(df[index.rows,box.vars[i]],df$machine[index.rows],max),digits=3) medians-round(tapply(df[index.rows,box.vars[i]],df$machine[index.rows],median),digits=3) table.out-data.frame(min=mins,median=medians,max=maxes) # + misc. gridExtra lines } } Currently I hard code the box.vars and index.group which is ok with me, but the for loops should be in a fancy function. Anyway, im sure theres an elegant plyr or apply that can do this for me... but as I said before, I need a FA Group (for loops anonymous)... Also, this winds up being a lot of calcs on a big data set. So, if you have magical ff, big.memory and/or doMC suggestions I'm all ears, I just have very little understanding of how they're working. Thanks for your help, Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xtable without a loop alongside a ggplot
I would like to create a table of my points and identify which 'quadrant' of a plot they are in with the 'origin' at the means. the kicker is i would like to display it right next to or below a ggplot of the data. Maybe xtable isnt the right thing to use, but its the only thing i can think of. Any help is appreciated! set.seed(144) x=rnorm(100,mean=5,sd=1) test-data.frame(x=x,y=x^2) test$right-sapply(test$x,function(x) {mean.x-mean(test$x);any(xmean.x)}) test$up-sapply(test$y,function(y) {mean.y-mean(test$y);any(ymean.y)}) for(i in 1:length(test$x)){ if(test$right[i]==TRUE test$up[i]==TRUE) print(paste(rownames(test[i,]),'is in the upper right quadrant')) if(test$right[i]==FALSE test$up[i]==TRUE) print(paste(rownames(test[i,]),'is in the upper left quadrant')) if(test$right[i]==TRUE test$up[i]==FALSE) print(paste(rownames(test[i,]),'is in the lower right quadrant')) if(test$right[i]==FALSE test$up[i]==FALSE) print(paste(rownames(test[i,]),'is in the lower left quadrant')) } I know theres a better way then using a for loop! and I haven't the foggiest how to use xtable. as i said, the ultimate goal is to create a plot with a table along side it showing outliers and where they appear using the inout function from the splancs package and a confidence ellipse from the ellipse package. Thank you for your help as usual! Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MASS fitdistr with plyr or data.table?
I am trying to extract the shape and scale parameters of a wind speed distribution for different sites. I can do this in a clunky way, but I was hoping to find a way using data.table or plyr. However, when I try I am met with the following: set.seed(144) weib.dist-rweibull(1,shape=3,scale=8) weib.test-data.table(cbind(1:10,weib.dist)) names(weib.test)-c('site','wind_speed') fitted-weib.test[,fitdistr(wind_speed,'weibull'),by=site] Error in class(ans[[length(byval) + jj]]) = class(testj[[jj]]) : invalid to set the class to matrix unless the dimension attribute is of length 2 (was 0) In addition: Warning messages: 1: In dweibull(x, shape, scale, log) : NaNs produced ... 10: In dweibull(x, shape, scale, log) : NaNs produced (the warning messages are normal from what I can tell) or using plyr: set.seed(144) weib.dist-rweibull(1,shape=3,scale=8) weib.test.too-data.frame(cbind(1:10,weib.dist)) names(weib.test.too)-c('site','wind_speed') fitted-ddply(weib.test.too,.(site),fitdistr,'weibull') Error in .fun(piece, ...) : 'x' must be a non-empty numeric vector those sound like similar errors to me, but I can't figure out how to make them go away! to prove I'm not crazy: fitdistr(weib.dist,'weibull')$estimate shapescale 2.996815 8.009757 Warning messages: 1: In dweibull(x, shape, scale, log) : NaNs produced 2: In dweibull(x, shape, scale, log) : NaNs produced 3: In dweibull(x, shape, scale, log) : NaNs produced 4: In dweibull(x, shape, scale, log) : NaNs produced Thanks Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MASS fitdistr call in plyr help!
I have a set of wind speeds read at different locations. The data is a data frame with two columns: site and wind speed. I want to split the data on site and call a function to find the shape and scale parameters of a weibull distribution fit. The end result is a plot with x-axis = shape and y-axis = scale. Currently my code looks like: fit_wind_speed-function(x){ x-replace(x,x=0,0.0001) temp-fitdistr(na.exclude(x[,1]),weibull) l-length(names(x)) for(i in 1:l){ temp[i]-(fitdistr(na.exclude(x[,i]),weibull)) } temp } wind_speed_wide_dataframe-function(x){ mini-min(x$site) maxi-max(x$site) ws.plot-as.matrix(subset(x,site==mini,select=(wind_speed))) row.names(ws.plot)-NULL for(i in (mini+1):maxi){ temp-as.matrix(subset(x,site==i,select=(wind_speed))) row.names(temp)-NULL ws.plot-add.col(ws.plot,temp) } as.data.frame(ws.plot) } ws.plots-wind_speed_wide_dataframe(dataset[,c(1,3)]) names(ws.plots)-c(min(dataset$site):max(dataset$site)) fit-fit_wind_speed(ws.plots) names(fit)-names(ws.plots) l-length(fit) i-1:l j-1:2 temp2-data.frame(1:l,2) temp-data.frame(names(fit),2) for(i in 1:l){temp-data.frame(fit[i])} for(i in 1:l){temp[i]-data.frame(fit[i])} for(i in 1:l){temp2[i,j]-temp[j,i]} names(temp2)-c(shape,scale) Id like to combine the two functions into one plyr call, but I can't figure out how it would work! If there is a better package than MASS i'm all ears for that too. Thanks, Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] string interpolation
Is there a way to do this in R? I have data in the form: 57_input 57_output 58_input 58_output etc. can i use a for loop (i in 57:n) that plots only the outputs? I want this to be robust so im not specifying a column id but rather something like c++ code, %s_input, i is that doable in R? Thanks, justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] linear regression in a data.frame using recast
I have a very large dataset with columns of id number, actual value, predicted value. This used to be a time series but I have dropped the time component. So I now have a data.frame where the id number is repeated but each value in the actual and predicted columns are unique. I assume I need to use recast somehow but I'm at a loss... how can I perform a simple linear regression (using lm()?) on my two variables for each unique id number? additionally, I need to fix the y-intercept at zero. Thanks for your help, Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.