Re: [R] Producing a table with mean values
My stupdity I made a late edit for clarity and forgot to run it to be sure I had changed everything. It should read: tabx - ddply(meltx, .(Seamount, variable), summarize, mean = mean(value), sd = sd(value)) My appologies. John Kane Kingston ON Canada -Original Message- From: smartpink...@yahoo.com Sent: Sat, 8 Sep 2012 11:14:11 -0700 (PDT) To: jrkrid...@inbox.com Subject: Re: [R] Producing a table with mean values Hi John, I am getting error messages with your solution. tabx - ddply(nn, .(Seamount, variable), summarize, mean = mean(value), sd = sd(value)) #Error in empty(.data) : object 'nn' not found A.K. - Original Message - From: John Kane jrkrid...@inbox.com To: Tinus Sonnekus tsonne...@gmail.com; r-help@r-project.org Cc: Sent: Saturday, September 8, 2012 1:19 PM Subject: Re: [R] Producing a table with mean values x - Seamount Pico Nano Micro Total_Ch Off_Mount1 0.0691 0.24200 0.00100 0.31210 Off_Mount1 0.0938 0.00521 0.02060 0.11961 Off_Mount1 0.1130 0.2 0.06620 0.37920 Off_Mount1 0.0864 0.15900 0.22300 0.46840 Off_Mount1 0.0262 0.04570 0.00261 0.07451 Off_Mount2 0.0314 0.17400 0.12800 0.33340 Off_Mount2 0.0314 0.17400 0.12800 0.23340 Off_Mount2 0.0414 0.17400 0.02800 0.23340 xx - read.table(textConnection(x), header=TRUE, as.is=TRUE) library(reshape) meltx - melt(xx) tabx - ddply(nn, .(Seamount, variable), summarize, mean = mean(value), sd = sd(value)) tabx John Kane Kingston ON Canada -Original Message- From: tsonne...@gmail.com Sent: Fri, 7 Sep 2012 22:49:55 +0200 To: r-help@r-project.org Subject: [R] Producing a table with mean values Hi All, I have a data set wit three size classes (pico, nano and micro) and 12 different sites (Seamounts). I want to produce a table with the mean and standard deviation values for each site. Seamount Pico Nano Micro Total_Ch 1 Off_Mount 1 0.0691 0.24200 0.00100 0.31210 2 Off_Mount 1 0.0938 0.00521 0.02060 0.11961 3 Off_Mount 1 0.1130 0.2 0.06620 0.37920 4 Off_Mount 1 0.0864 0.15900 0.22300 0.46840 5 Off_Mount 1 0.0262 0.04570 0.00261 0.07451 6 Off_Mount 2 0.0314 0.17400 0.12800 0.33340 I tried the following script but get an error message *Error in results[i, u.Pico, u.Nano, u.Micro] - sapply(z, mean) : * * incorrect number of subscripts * The code I used: *SChla - read.csv(SM_Chla_data.csv)* *sm - as.character(unique(SChla$Seamount))* * * *results - matrix(NA,nrow=length(sm),ncol=6,dimnames=list(sm,c(u.Pico,u.Nano,u.Micro,sd.Pico,sd.Nano,sd.Micro))) * * * *for (i in sm){* *z - subset(SChla, Seamount==i, select=c(Pico, Nano, Micro))* *results[i,u.Pico,u.Nano,u.Micro] - sapply(z, mean)* *results[i,sd.Pico,sd.Nano,sd.Micro] - sapply(z, sd)* *}* * * *print(results)* Please can some one advise me how to fix the error or maybe have an alternative solution I will appreciate it. Thank you. Tinus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error msg in rpanel
I am working on the r panel package. Now if i have a function that uses a radiogroup button, and if i attempt to run the function from inside the rpanel menu, i get this error: Error in panel$intname : $ operator is invalid for atomic vectors However if i run the function per se i.e. not from inside the rpanel menu, but by calling it independently, the above error doesn't appear. Here is a simple example. Try running the whole code versus just running the add() function. The former results in the above error and the latter doesn't. install.packages(c(rpanel,tkrplot)) my.menu - function(panel) { library(rpanel,tkrplot) if (panel$menu==Add){ add() } else panel } main.panel - rp.control(title = Main Menu,size=c(200,150)) rp.menu(panel = main.panel, var = menu, labels = list(list(Addition, Add)), action = my.menu) # function to do adddition add - function(){ my.draw - function(panel) { if(panel$vals==numbers){ val-as.numeric(panel$nmbr1)+as.numeric(panel$nmbr2) } else if(panel$vals==strings){ val - paste(as.character(panel$nmbr1), and ,as.character(panel$nmbr2)) } plot(1:10, 1:10, type=n, xlab=, ylab=, axes=FALSE, frame = TRUE) text(5, 5, paste(Result: , val),cex=1.4) panel } my.redraw - function(panel) { rp.tkrreplot(panel, my.tkrplot) panel } my.panel - rp.control(title = Addition) rp.textentry(panel = my.panel, var = nmbr1, labels = First: , action = my.redraw, initval=100) rp.textentry(panel = my.panel, var = nmbr2, labels = Second:, action = my.redraw, initval=200) rp.radiogroup(panel = my.panel, var = vals, values = c(numbers, strings), action = my.redraw, title = Type) rp.tkrplot(panel = my.panel, name = my.tkrplot, plotfun = my.draw) } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] method or package to make special boxplot
On 09/09/2012 12:14 AM, Zhang Qintao wrote: Hi, All, I am trying to use R to make the following type of boxplot while I couldn't find a way to do it. My dataset looks like X1 Y1 X2 Y2 SPLIT. The split highlights my experiment details and both x and y are continuous numerical values. I need to plot y vs. x with split as legend and boxplot has to be used for all splits. May I ask how to get it? Currently available boxplot only applies on the case that X axis is character. Hi Qintao, Do you want a sort of 2D boxplot? The example below gives a rough idea as to what it would look like, with boxplots for your Xs and Ys centered at their medians and an abcissa with the labels for your splits. Needs a bit of work to turn this into a function, so let me know if it does what you want. Jim x1-rnorm(10) y1-rnorm(10) y2-rnorm(10) x2-rnorm(10) x1sum-boxplot(x1) y1sum-boxplot(y1) offset=4 x2sum-boxplot(x2,at=median(y2)+offset,add=TRUE) y2sum-boxplot(y2+offset) bxp(x1sum,at=median(y1),xlim=c(y1sum$stats[1],y2sum$stats[5]), ylim=c(min(c(x1sum$stats[1],x2sum$stats[1])), max(c(x1sum$stats[5],x2sum$stats[5]))),axes=FALSE) bxp(y1sum,at=median(x1),add=TRUE,horizontal=TRUE,axes=FALSE) bxp(x2sum,at=median(y2+offset),add=TRUE,axes=FALSE) bxp(y2sum,at=median(x2),horizontal=TRUE,add=TRUE,axes=FALSE) box() axis(2) axis(1,at=c(median(y1),median(y2)+offset),labels=c(Split1,Split2)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] use subset to trim data but include last per category
Hello, I bumped into the following funny use-case. I have too much data for a given plot. I have the following data frame df: str(df) 'data.frame': 5015 obs. of 5 variables: $ n : Factor w/ 5 levels 1000,2000,..: 1 1 1 1 1 1 1 1 1 1 ... $ iter : int 10 20 30 40 50 60 70 80 90 100 ... $ Error : num 1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ... $ Duality_Gap: num 20080 3789 855 443 321 ... $ Runtime: num 0.00536 0.01353 0.01462 0.01571 0.01681 ... But if I plot e.g. Runtime vs log(Duality Gap) I have too many observations due to taking a snapshot every 10 iterations rather than say 500 and the plot looks very cluttered. So I would like to trim the data frame including only those records for which iter is multiple of 500 and so I do this: df - subset(df, iter %% 500 == 0) This gives me almost exactly what I need except that the last and most important Duality Gap observations are of course gone due to the filtering ... I would like to change the subset clause to be iter %% 500 _or_ the record is the last per n (n is my problem size and category in this case) ... how can I do that? I thought of adding a new column that flags whether a given row is the last element per category as last Boolean but this is a bit too complicated .. is there a simpler condition construct that can be used with the subset command? TIA, Best regards, Giovanni __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with duplicates in row.names
Thanks Arun, I can manage something with that, just need then to delete the first raw with photoshop ! Thanks Fred -- View this message in context: http://r.789695.n4.nabble.com/Problem-with-duplicates-in-row-names-tp4642518p4642604.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to save a heatmap.2 in png /jpeg /tiff
Hey, I am still working on my heat map (for those who are read my previous post about row.names) Now, I would like to save my heat map.2 in .png or .tiff in order being able to work on the picture in photoshop, but it doesn't work. I'am using (as I have found on some forum) png(heatmap.2.png) # and it just doesn't work. when I try doing it with:: jpeg(heatmap.2.jpeg) # it works once every 10 times, but it's a 22kb file. completely use less !!! I really need to have high quality image, as I will have to work on photoshop and also I will have to cut and zoom in just some lines of my heatmap. #here is the code I use for my heatmap.2 : heatmap.2(a_matrix, Rowv=NA, Colv =NA, col=greenred(60), scale=column, margins=c(7,10), trace=none, density.info=c(none)) Does someone know what I have to do in order to get my heatmap.2.png ??? Do I need some other package (I only use gplots, to allow the heatpmap.2) THANKS for your help Fred [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to save a heatmap.2 in png /jpeg /tiff
Normally the workflow is: png(heatmap.png) # don't forget the second quote, as you did below my.plot.code # whatever you need to draw the heatmap dev.off() # people often forget this step - did you? You'll probably want to adjust the size and resolution settings for png() to get the desired high-resolution output; see ?png for details. Sarah On Sun, Sep 9, 2012 at 10:04 AM, STADLER Frederic frederic.stad...@unifr.ch wrote: Hey, I am still working on my heat map (for those who are read my previous post about row.names)… Now, I would like to save my heat map.2 in .png or .tiff in order being able to work on the picture in photoshop, but it doesn't work. I'am using (as I have found on some forum) png(heatmap.2.png) # and it just doesn't work. when I try doing it with:: jpeg(heatmap.2.jpeg) # it works once every 10 times, but it's a 22kb file. completely use less !!! I really need to have high quality image, as I will have to work on photoshop and also I will have to cut and zoom in just some lines of my heatmap. #here is the code I use for my heatmap.2 : heatmap.2(a_matrix, Rowv=NA, Colv =NA, col=greenred(60), scale=column, margins=c(7,10), trace=none, density.info=c(none)) Does someone know what I have to do in order to get my heatmap.2.png ??? Do I need some other package (I only use gplots, to allow the heatpmap.2) THANKS for your help Fred -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to save a heatmap.2 in png /jpeg /tiff
It just doesn't work could mean anything... and for those of us for whom it does work, that leaves a lot of possible differences between your case and ours. This is your cue to read the Posting Guide. Some issues I have encountered: If you are using Windows, and you have opened a graphics file in an editor, and you try to write a new version out with R, the editor will prevent this change in most cases. You have to remember to close the graphics file first. Also, you need to remember to close the file in R using dev.off() when you are done writing to it for similar reasons. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. STADLER Frederic frederic.stad...@unifr.ch wrote: Hey, I am still working on my heat map (for those who are read my previous post about row.names)� Now, I would like to save my heat map.2 in .png or .tiff in order being able to work on the picture in photoshop, but it doesn't work. I'am using (as I have found on some forum) png(heatmap.2.png) # and it just doesn't work. when I try doing it with:: jpeg(heatmap.2.jpeg) # it works once every 10 times, but it's a 22kb file. completely use less !!! I really need to have high quality image, as I will have to work on photoshop and also I will have to cut and zoom in just some lines of my heatmap. #here is the code I use for my heatmap.2 : heatmap.2(a_matrix, Rowv=NA, Colv =NA, col=greenred(60), scale=column, margins=c(7,10), trace=none, density.info=c(none)) Does someone know what I have to do in order to get my heatmap.2.png ??? Do I need some other package (I only use gplots, to allow the heatpmap.2) THANKS for your help Fred [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use subset to trim data but include last per category
dfthin - df[ c(which(iter %% 500 == 0),nrow(df) ] or dfthin - subset(df, (iter %% 500 == 0) | (seq.int(nrow(df)==nrow(df))) N.B. You should avoid using the name df for your variables, because it is the name of a built-in function that you are hiding by doing so. Others may be confused, and eventually you may want to use that function yourself. One solution is to use DF for your variables... another is to use more descriptive names. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Giovanni Azua brave...@gmail.com wrote: Hello, I bumped into the following funny use-case. I have too much data for a given plot. I have the following data frame df: str(df) 'data.frame': 5015 obs. of 5 variables: $ n : Factor w/ 5 levels 1000,2000,..: 1 1 1 1 1 1 1 1 1 1 ... $ iter : int 10 20 30 40 50 60 70 80 90 100 ... $ Error : num 1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ... $ Duality_Gap: num 20080 3789 855 443 321 ... $ Runtime: num 0.00536 0.01353 0.01462 0.01571 0.01681 ... But if I plot e.g. Runtime vs log(Duality Gap) I have too many observations due to taking a snapshot every 10 iterations rather than say 500 and the plot looks very cluttered. So I would like to trim the data frame including only those records for which iter is multiple of 500 and so I do this: df - subset(df, iter %% 500 == 0) This gives me almost exactly what I need except that the last and most important Duality Gap observations are of course gone due to the filtering ... I would like to change the subset clause to be iter %% 500 _or_ the record is the last per n (n is my problem size and category in this case) ... how can I do that? I thought of adding a new column that flags whether a given row is the last element per category as last Boolean but this is a bit too complicated .. is there a simpler condition construct that can be used with the subset command? TIA, Best regards, Giovanni __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use subset to trim data but include last per category
Hi Jeff, Thanks for your help, but this doesn't work, there are two problems. First and most important I need to keep the last _per category_ where my category is n and not the last globally. Second, there seems to be an issue with the subset variation that ends up not filtering anything ... but this is a minor thing. Best. Giovanni On Sep 9, 2012, at 5:59 PM, Jeff Newmiller wrote: dfthin - df[ c(which(iter %% 500 == 0),nrow(df) ] or dfthin - subset(df, (iter %% 500 == 0) | (seq.int(nrow(df)==nrow(df))) N.B. You should avoid using the name df for your variables, because it is the name of a built-in function that you are hiding by doing so. Others may be confused, and eventually you may want to use that function yourself. One solution is to use DF for your variables... another is to use more descriptive names. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Giovanni Azua brave...@gmail.com wrote: Hello, I bumped into the following funny use-case. I have too much data for a given plot. I have the following data frame df: str(df) 'data.frame':5015 obs. of 5 variables: $ n : Factor w/ 5 levels 1000,2000,..: 1 1 1 1 1 1 1 1 1 1 ... $ iter : int 10 20 30 40 50 60 70 80 90 100 ... $ Error : num 1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ... $ Duality_Gap: num 20080 3789 855 443 321 ... $ Runtime: num 0.00536 0.01353 0.01462 0.01571 0.01681 ... But if I plot e.g. Runtime vs log(Duality Gap) I have too many observations due to taking a snapshot every 10 iterations rather than say 500 and the plot looks very cluttered. So I would like to trim the data frame including only those records for which iter is multiple of 500 and so I do this: df - subset(df, iter %% 500 == 0) This gives me almost exactly what I need except that the last and most important Duality Gap observations are of course gone due to the filtering ... I would like to change the subset clause to be iter %% 500 _or_ the record is the last per n (n is my problem size and category in this case) ... how can I do that? I thought of adding a new column that flags whether a given row is the last element per category as last Boolean but this is a bit too complicated .. is there a simpler condition construct that can be used with the subset command? TIA, Best regards, Giovanni __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use subset to trim data but include last per category
I would like to change the subset clause to be iter %% 500 _or_ the record is the last per n If your data.frame df is sorted by n you can define the function isLastInRun - function(x) c(x[-1] != x[-length(x)], TRUE) and use it as subset(df, iter %% 500 == 0 | isLastInRun(n)) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Giovanni Azua Sent: Sunday, September 09, 2012 8:14 AM To: r-help@r-project.org Subject: [R] use subset to trim data but include last per category Hello, I bumped into the following funny use-case. I have too much data for a given plot. I have the following data frame df: str(df) 'data.frame': 5015 obs. of 5 variables: $ n : Factor w/ 5 levels 1000,2000,..: 1 1 1 1 1 1 1 1 1 1 ... $ iter : int 10 20 30 40 50 60 70 80 90 100 ... $ Error : num 1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ... $ Duality_Gap: num 20080 3789 855 443 321 ... $ Runtime: num 0.00536 0.01353 0.01462 0.01571 0.01681 ... But if I plot e.g. Runtime vs log(Duality Gap) I have too many observations due to taking a snapshot every 10 iterations rather than say 500 and the plot looks very cluttered. So I would like to trim the data frame including only those records for which iter is multiple of 500 and so I do this: df - subset(df, iter %% 500 == 0) This gives me almost exactly what I need except that the last and most important Duality Gap observations are of course gone due to the filtering ... I would like to change the subset clause to be iter %% 500 _or_ the record is the last per n (n is my problem size and category in this case) ... how can I do that? I thought of adding a new column that flags whether a given row is the last element per category as last Boolean but this is a bit too complicated .. is there a simpler condition construct that can be used with the subset command? TIA, Best regards, Giovanni __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use subset to trim data but include last per category
Hello, This solves my problem in a horribly inelegant way that works: df - data.frame(n=newInput$n, iter=newInput$iter, Error=newInput$Error, Duality_Gap=newInput$Duality, Runtime=newInput$Acc) df_last - aggregate(x=df$iter, by=list(df$n), FUN=max) names(df_last)[names(df_last)==Group.1] - n names(df_last)[names(df_last)==x] - iter # n iter #1 1000 2518 #2 2000 5700 #3 3000 10026 #4 4000 13916 #5 5000 17962 df$last - FALSE df$last[df$n == 1000 df$iter == 2518] - TRUE df$last[df$n == 2000 df$iter == 5700] - TRUE df$last[df$n == 3000 df$iter == 10026] - TRUE df$last[df$n == 4000 df$iter == 13916] - TRUE df$last[df$n == 5000 df$iter == 17962] - TRUE df - subset(df, (iter %% 500 == 0) | (df$last == TRUE)) How can I do the same without hardwiring these numbers? TIA, Best regards, Giovanni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Rscript] difficulty passing named arguments from commandline
https://github.com/TomRoche/GEIA_to_netCDF/commit/62ad6325d339c61ac4e7de5e7d4d26fa21ed918c # - Rscript ./netCDF.stats.to.stdout.r netcdf.fp=./GEIA_N2O_oceanic.nc var.name=emi_n2o # fails # + Rscript ./netCDF.stats.to.stdout.r 'netcdf.fp=./GEIA_N2O_oceanic.nc' 'var.name=emi_n2o' # succeeds https://stat.ethz.ch/pipermail/r-help/2012-September/323287.html The trailling arguments to Rscript, generally read by commandArgs(TRUE), come into R as a vector of character strings. Your script can interpret those character strings in many ways. The [script linked above] processed them all with eval(parse(text=arg[i])) so all the arguments had to be valid R expressions: strings must be quoted, unquoted things are treated as names of R objects, slash means division, = and - mean assignment, etc. That explains the need for strict quoting--thanks. If that is a problem, don't use parse() to interpret the strings; use sub() or strsplit() to extract substrings and do what you want with them. (This is somewhat safer than using eval(parse(text=)) because it can do less.) Assigning arguments via strsplit() does seem to be more of a PITA, but it works now @ https://github.com/TomRoche/GEIA_to_netCDF/blob/master/netCDF.stats.to.stdout.r your assistance is appreciated, Tom Roche tom_ro...@pobox.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] PCA legend outside of PCA plot
Hi All, I have been trying to get to plot my PCA legend outside of the PCA plot, but success still alludes me. Can you guys please advise how I can achieve this. I used locater() to obtain coordinates for below the Comp.1 axis. Using these coordinates the legend disappears. Below is the code for the PCA and legend. Thanks in advance for the help. Regards Tinus r.cols - rainbow(length(unique(SEData$Seamount))) pca1 - princomp(SEData3, scores=TRUE, cor=TRUE) biplot(pca1, var.axes= TRUE, xlabs=rep(,nrow(SEData3)),main=Seamounts PCA) rrr - apply(pca1$scores[,1:2],2, range) par(usr=as.vector(rrr)) points(pca1$scores[,1:2], col=r.cols , pch=20) legend(-8, 2.95, sm, col = r.cols, text.col = black, lty = NULL, pch = 20,horiz = F,) -- M.J. Sonnekus PhD Candidate (The Phytoplankton of the southern Agulhas Current Large Marine Ecosystem (ACLME)) Department of Botany South Campus Nelson Mandela Metropolitan University PO Box 77000 Port Elizabeth South Africa 6031 Cell: 082 080 9638 E-mail: tsonne...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error msg in rpanel
If I run the whole code and click on the addition menu and then click Add, the error comes. But not when I just call add(). So I guess the problem is not with the rpanel package. Also tried panel[vars] instead of panel$vars. no luck. Same error when I call the add function from the rpanel GUI. -- View this message in context: http://r.789695.n4.nabble.com/Error-msg-in-rpanel-tp4642603p4642616.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to save a heatmap.2 in png /jpeg /tiff
hey Sarah, thanks for your help !! Of Course I put the second quote also (I forgot to put it on the last post). Sorry, I don't get the my.plot.code... # I'm new in R and use it only to draw heatmaps right now. Well, I did forget the dev.off(). # but I got null device (1) # when quartz is turned off and when it's on I get : quartz 2. But I don't have any files called heatmap.2.png on my computer. I really don't understand why I don't get anything ! and when I do: jpeg (heatmap.jpg) # it works but I get only a 20kb picture which is useless in my case (edit and work on it in photoshop) Fred -- View this message in context: http://r.789695.n4.nabble.com/how-to-save-a-heatmap-2-in-png-jpeg-tiff-tp4642607p4642615.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to save a heatmap.2 in png /jpeg /tiff
Hey Jeff, sorry for the it just doesn't work, but it's really what it does... I can't find any file called heatmap.2.png on my computer after creating my heatmap.2 (that I can see on Quartz) and typing: png(heatmap.2.png) Maybe I should add, that I am working on mac with MAC OS X (v.10.8.1).!! It is not a problem of having a picture editor open when I try to create this file. Any other idea ?? Thanks Fred -- View this message in context: http://r.789695.n4.nabble.com/how-to-save-a-heatmap-2-in-png-jpeg-tiff-tp4642607p4642617.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to save a heatmap.2 in png /jpeg /tiff
Mr Stadler, On 9 September 2012 10:36, Fred frederic.stad...@unifr.ch wrote: But I don't have any files called heatmap.2.png on my computer. I really don't understand why I don't get anything ! What does getwd() print out as a path? Check there for the your file. -- H -- Sent from my mobile device Envoyait de mon portable [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA legend outside of PCA plot
Try adding the parameter xpd=TRUE to your legend() statement. Without reproducible code it is pretty hard to be sure what the problem is. -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Tinus Sonnekus Sent: Sunday, September 09, 2012 1:37 PM To: r-help@r-project.org Subject: [R] PCA legend outside of PCA plot Hi All, I have been trying to get to plot my PCA legend outside of the PCA plot, but success still alludes me. Can you guys please advise how I can achieve this. I used locater() to obtain coordinates for below the Comp.1 axis. Using these coordinates the legend disappears. Below is the code for the PCA and legend. Thanks in advance for the help. Regards Tinus r.cols - rainbow(length(unique(SEData$Seamount))) pca1 - princomp(SEData3, scores=TRUE, cor=TRUE) biplot(pca1, var.axes= TRUE, xlabs=rep(,nrow(SEData3)),main=Seamounts PCA) rrr - apply(pca1$scores[,1:2],2, range) par(usr=as.vector(rrr)) points(pca1$scores[,1:2], col=r.cols , pch=20) legend(-8, 2.95, sm, col = r.cols, text.col = black, lty = NULL, pch = 20,horiz = F,) -- M.J. Sonnekus PhD Candidate (The Phytoplankton of the southern Agulhas Current Large Marine Ecosystem (ACLME)) Department of Botany South Campus Nelson Mandela Metropolitan University PO Box 77000 Port Elizabeth South Africa 6031 Cell: 082 080 9638 E-mail: tsonne...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sum of column from another df based of row values of df1
Dear All, I need to sum a column from another dataframe based on the row values of one dataframe. I am stuck in a loop trying to accomplish it and at current speed it will take more than 80 hours to complete. Needless to say I am looking for a more elegant/quicker solution. Really need some help here. Here is the issue: I have a dataframe CALL (the dput of head is given below) which has close to a million rows. There are 2 date columns which are of importance, DATE and EXPDATE. There is another dataframe, VOL (dput of head given), which has 2 columns, DATE and VOL. It has the volatility corresponding to each day and it has a total of 124 records (corresponding to 6 months). I want to add another column in the CALL dataframe which would contain the sum of all the volatilities from the VOL df for the period specified by the interval of DATE and EXPDATE in each row of CALL df. For ex: In the first row, DATE is '03-01-2011' and EXPDATE is '27-01-2011'. So I want the SUM column (A new column in CALL df) to contain the sum of volatilities of 03-01, 04-01, 05-01 till 27-01 from the VOL dataframe. I have to repeat this process for all the rows in the dataframe. Here is the for-loop version of the solution: for (k in 1:nrow(CALL)){ CALL$SUM[k] = sum(subset(VOL$VOL, VOL$DATE = CALL$DATE[k] VOL$DATE = CALL$EXPDATE[k])) } The loop will run for close to a million times, it has been running for more than 10 hours and its just 12% complete. It would take more than 80 hours to complete, not the mention the toll it would take on my laptop. So is there a better way that I can accomplish this task? Any input would be greatly appreciated. Below are the dput of the two dataframes. One point of note is that there are only 124 DISTINCT values of DATE and 6 DISTINCT values of EXPDATE, in case it can be used in some way. dput(CALL) structure(list(NAME = c(STK, STK, STK, STK, STK, STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029, 15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L, 6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30, 15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3, 101.5, 101.2, 101.2), QUANT = c(1850L, 2000L, 2000L, 1700L, 2000L, 2000L), DATE = structure(c(14977, 14977, 14977, 14977, 14977, 14977), class = Date), DTTM = structure(c(1294044516, 1294048422, 1294048350, 1294048453, 1294048327, 1294048347), class = c(POSIXct, POSIXt), tzone = ), TTE = c(38, 38, 38, 38, 38, 38)), .Names = c(NAME, EXPDATE, STRIKE, TMSTMP, PRICE, QUANT, DATE, DTTM, TTE), row.names = c(1, 2, 3, 4, 5, 6), class = data.frame) dput(VOL) structure(list(DATE = structure(c(1293993000, 1294079400, 1294165800, 1294252200, 1294338600, 1294597800), class = c(POSIXct, POSIXt ), tzone = ), VOL = c(2.32666706461792e-05, 6.79164443640051e-05, 5.66390788200039e-05, 7.25422438459608e-05, 0.000121727951296865, 0.000216076713994619)), .Names = c(DATE, VOL), row.names = c(NA, 6L), class = data.frame) Please do let me know if any more information from my side would help or if I need to explain the issue more clearly. Any minor improvement will be great help. Thanks in advance. -Shivam -- *Victoria Concordia Crescit* __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum of column from another df based of row values of df1
How about an improvement to 16 seconds. The first thing to do is to convert you data to a matrix because accessing data in a dataframe is very expensive. If you run Rprof on your code you will see that all the time is spent in retrieving the information. Converting to a matrix and using matrix accessing is considerably faster. I did convert the POSIXct to Date. You were also paying a lot in the constant conversion of POSIXct to Date for your comparisons. I just replicated your CALL to 1 million rows for testing. CALL - + structure(list(NAME = c(STK, STK, STK, STK, STK, + STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029, + 15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L, + 6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30, + 15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3, + 101.5, 101.2, 101.2), QUANT = c(1850L, 2000L, 2000L, 1700L, 2000L, + 2000L), DATE = structure(c(14977, 14977, 14977, 14977, 14977, + 14977), class = Date), DTTM = structure(c(1294044516, 1294048422, + 1294048350, 1294048453, 1294048327, 1294048347), class = c(POSIXct, + POSIXt), tzone = ), TTE = c(38, 38, 38, 38, 38, 38)), .Names = c(NAME, + EXPDATE, STRIKE, TMSTMP, PRICE, QUANT, DATE, DTTM, + TTE), row.names = c(1, 2, 3, 4, 5, 6), class = data.frame) VOL - + structure(list(DATE = structure(c(1293993000, 1294079400, 1294165800, + 1294252200, 1294338600, 1294597800), class = c(POSIXct, POSIXt + ), tzone = ), VOL = c(2.32666706461792e-05, 6.79164443640051e-05, + 5.66390788200039e-05, 7.25422438459608e-05, 0.000121727951296865, + 0.000216076713994619)), .Names = c(DATE, VOL), row.names = c(NA, + 6L), class = data.frame) # convert to matrices for faster testing mCALL - cbind(CALL$DATE, CALL$EXPDATE) mVOL - cbind(as.Date(VOL$DATE), VOL$VOL) # convert POSIXct to Date # create 1M rows in mCALL mCALL - rbind(mCALL, mCALL[rep(1L, 1e6),]) result - numeric(nrow(mCALL)) system.time({ + for (i in 1:nrow(mCALL)){ + result[i] - sum(mVOL[(mVOL[, 1L] = mCALL[i,1L]) + (mVOL[, 1L] = mCALL[i, 2L]), 2L]) + } + }) user system elapsed 15.940.00 16.07 On Sun, Sep 9, 2012 at 2:58 PM, Shivam shivamsi...@gmail.com wrote: Dear All, I need to sum a column from another dataframe based on the row values of one dataframe. I am stuck in a loop trying to accomplish it and at current speed it will take more than 80 hours to complete. Needless to say I am looking for a more elegant/quicker solution. Really need some help here. Here is the issue: I have a dataframe CALL (the dput of head is given below) which has close to a million rows. There are 2 date columns which are of importance, DATE and EXPDATE. There is another dataframe, VOL (dput of head given), which has 2 columns, DATE and VOL. It has the volatility corresponding to each day and it has a total of 124 records (corresponding to 6 months). I want to add another column in the CALL dataframe which would contain the sum of all the volatilities from the VOL df for the period specified by the interval of DATE and EXPDATE in each row of CALL df. For ex: In the first row, DATE is '03-01-2011' and EXPDATE is '27-01-2011'. So I want the SUM column (A new column in CALL df) to contain the sum of volatilities of 03-01, 04-01, 05-01 till 27-01 from the VOL dataframe. I have to repeat this process for all the rows in the dataframe. Here is the for-loop version of the solution: for (k in 1:nrow(CALL)){ CALL$SUM[k] = sum(subset(VOL$VOL, VOL$DATE = CALL$DATE[k] VOL$DATE = CALL$EXPDATE[k])) } The loop will run for close to a million times, it has been running for more than 10 hours and its just 12% complete. It would take more than 80 hours to complete, not the mention the toll it would take on my laptop. So is there a better way that I can accomplish this task? Any input would be greatly appreciated. Below are the dput of the two dataframes. One point of note is that there are only 124 DISTINCT values of DATE and 6 DISTINCT values of EXPDATE, in case it can be used in some way. dput(CALL) structure(list(NAME = c(STK, STK, STK, STK, STK, STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029, 15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L, 6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30, 15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3, 101.5, 101.2, 101.2), QUANT = c(1850L, 2000L, 2000L, 1700L, 2000L, 2000L), DATE = structure(c(14977, 14977, 14977, 14977, 14977, 14977), class = Date), DTTM = structure(c(1294044516, 1294048422, 1294048350, 1294048453, 1294048327, 1294048347), class = c(POSIXct, POSIXt), tzone = ), TTE = c(38, 38, 38, 38, 38, 38)), .Names = c(NAME, EXPDATE, STRIKE, TMSTMP, PRICE, QUANT, DATE, DTTM, TTE), row.names = c(1, 2, 3, 4, 5, 6), class = data.frame) dput(VOL) structure(list(DATE = structure(c(1293993000, 1294079400, 1294165800, 1294252200, 1294338600, 1294597800), class =
Re: [R] how to save a heatmap.2 in png /jpeg /tiff
Hello, Em 09-09-2012 18:36, Fred escreveu: hey Sarah, thanks for your help !! Of Course I put the second quote also (I forgot to put it on the last post). Sorry, I don't get the my.plot.code... What Sarah meant is that you must put your.plot.code between the instructions that open and close the graphics device. This is example 1 from the 'gplots::heatmap.2' help page, adapted. # From ?gplots::heatmap.2 library(gplots) data(mtcars) x - as.matrix(mtcars) # Plot nothing, but like Jeff said (suggested) it does something # it opens the device and closes it png(file = myplot.png, bg = transparent) dev.off() # 318 bytes file in current directory # Plot an heatmap.2, example 1 in ?gplots::heatmap.2 png(file = heatmap2.png) heatmap.2(x) ## default - dendrogram plotted and reordering done. dev.off() ## 10Kb file in current dir # The same but to a jpeg graphics device jpeg(file = heatmap2.jpeg) heatmap.2(x) ## same as above dev.off() ## 46Kb file Hope this helps, Rui Barradas # I'm new in R and use it only to draw heatmaps right now. Well, I did forget the dev.off(). # but I got null device (1) # when quartz is turned off and when it's on I get : quartz 2. But I don't have any files called heatmap.2.png on my computer. I really don't understand why I don't get anything ! and when I do: jpeg (heatmap.jpg) # it works but I get only a 20kb picture which is useless in my case (edit and work on it in photoshop) Fred -- View this message in context: http://r.789695.n4.nabble.com/how-to-save-a-heatmap-2-in-png-jpeg-tiff-tp4642607p4642615.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum of column from another df based of row values of df1
Thanks a lot Jim, it works a treat. Just had to change the date format in the mCALL as well. But you saved me 80 hours of fretting and frustration. Really thankful for it. Regards, Shivam On Mon, Sep 10, 2012 at 1:33 AM, jim holtman jholt...@gmail.com wrote: How about an improvement to 16 seconds. The first thing to do is to convert you data to a matrix because accessing data in a dataframe is very expensive. If you run Rprof on your code you will see that all the time is spent in retrieving the information. Converting to a matrix and using matrix accessing is considerably faster. I did convert the POSIXct to Date. You were also paying a lot in the constant conversion of POSIXct to Date for your comparisons. I just replicated your CALL to 1 million rows for testing. CALL - + structure(list(NAME = c(STK, STK, STK, STK, STK, + STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029, + 15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L, + 6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30, + 15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3, + 101.5, 101.2, 101.2), QUANT = c(1850L, 2000L, 2000L, 1700L, 2000L, + 2000L), DATE = structure(c(14977, 14977, 14977, 14977, 14977, + 14977), class = Date), DTTM = structure(c(1294044516, 1294048422, + 1294048350, 1294048453, 1294048327, 1294048347), class = c(POSIXct, + POSIXt), tzone = ), TTE = c(38, 38, 38, 38, 38, 38)), .Names = c(NAME, + EXPDATE, STRIKE, TMSTMP, PRICE, QUANT, DATE, DTTM, + TTE), row.names = c(1, 2, 3, 4, 5, 6), class = data.frame) VOL - + structure(list(DATE = structure(c(1293993000, 1294079400, 1294165800, + 1294252200, 1294338600, 1294597800), class = c(POSIXct, POSIXt + ), tzone = ), VOL = c(2.32666706461792e-05, 6.79164443640051e-05, + 5.66390788200039e-05, 7.25422438459608e-05, 0.000121727951296865, + 0.000216076713994619)), .Names = c(DATE, VOL), row.names = c(NA, + 6L), class = data.frame) # convert to matrices for faster testing mCALL - cbind(CALL$DATE, CALL$EXPDATE) mVOL - cbind(as.Date(VOL$DATE), VOL$VOL) # convert POSIXct to Date # create 1M rows in mCALL mCALL - rbind(mCALL, mCALL[rep(1L, 1e6),]) result - numeric(nrow(mCALL)) system.time({ + for (i in 1:nrow(mCALL)){ + result[i] - sum(mVOL[(mVOL[, 1L] = mCALL[i,1L]) + (mVOL[, 1L] = mCALL[i, 2L]), 2L]) + } + }) user system elapsed 15.940.00 16.07 On Sun, Sep 9, 2012 at 2:58 PM, Shivam shivamsi...@gmail.com wrote: Dear All, I need to sum a column from another dataframe based on the row values of one dataframe. I am stuck in a loop trying to accomplish it and at current speed it will take more than 80 hours to complete. Needless to say I am looking for a more elegant/quicker solution. Really need some help here. Here is the issue: I have a dataframe CALL (the dput of head is given below) which has close to a million rows. There are 2 date columns which are of importance, DATE and EXPDATE. There is another dataframe, VOL (dput of head given), which has 2 columns, DATE and VOL. It has the volatility corresponding to each day and it has a total of 124 records (corresponding to 6 months). I want to add another column in the CALL dataframe which would contain the sum of all the volatilities from the VOL df for the period specified by the interval of DATE and EXPDATE in each row of CALL df. For ex: In the first row, DATE is '03-01-2011' and EXPDATE is '27-01-2011'. So I want the SUM column (A new column in CALL df) to contain the sum of volatilities of 03-01, 04-01, 05-01 till 27-01 from the VOL dataframe. I have to repeat this process for all the rows in the dataframe. Here is the for-loop version of the solution: for (k in 1:nrow(CALL)){ CALL$SUM[k] = sum(subset(VOL$VOL, VOL$DATE = CALL$DATE[k] VOL$DATE = CALL$EXPDATE[k])) } The loop will run for close to a million times, it has been running for more than 10 hours and its just 12% complete. It would take more than 80 hours to complete, not the mention the toll it would take on my laptop. So is there a better way that I can accomplish this task? Any input would be greatly appreciated. Below are the dput of the two dataframes. One point of note is that there are only 124 DISTINCT values of DATE and 6 DISTINCT values of EXPDATE, in case it can be used in some way. dput(CALL) structure(list(NAME = c(STK, STK, STK, STK, STK, STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029, 15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L, 6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30, 15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3, 101.5, 101.2, 101.2), QUANT = c(1850L, 2000L, 2000L, 1700L, 2000L, 2000L), DATE = structure(c(14977, 14977, 14977, 14977, 14977, 14977), class = Date), DTTM = structure(c(1294044516, 1294048422, 1294048350, 1294048453, 1294048327, 1294048347), class = c(POSIXct, POSIXt), tzone = ), TTE
Re: [R] Sum of column from another df based of row values of df1
Just to add, I did not know that the speed of data access is so much different in matrix and dataframes. This is one for the future. Thanks again Jim :) -Shivam On Mon, Sep 10, 2012 at 3:29 AM, Shivam shivamsi...@gmail.com wrote: Thanks a lot Jim, it works a treat. Just had to change the date format in the mCALL as well. But you saved me 80 hours of fretting and frustration. Really thankful for it. Regards, Shivam On Mon, Sep 10, 2012 at 1:33 AM, jim holtman jholt...@gmail.com wrote: How about an improvement to 16 seconds. The first thing to do is to convert you data to a matrix because accessing data in a dataframe is very expensive. If you run Rprof on your code you will see that all the time is spent in retrieving the information. Converting to a matrix and using matrix accessing is considerably faster. I did convert the POSIXct to Date. You were also paying a lot in the constant conversion of POSIXct to Date for your comparisons. I just replicated your CALL to 1 million rows for testing. CALL - + structure(list(NAME = c(STK, STK, STK, STK, STK, + STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029, + 15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L, + 6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30, + 15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3, + 101.5, 101.2, 101.2), QUANT = c(1850L, 2000L, 2000L, 1700L, 2000L, + 2000L), DATE = structure(c(14977, 14977, 14977, 14977, 14977, + 14977), class = Date), DTTM = structure(c(1294044516, 1294048422, + 1294048350, 1294048453, 1294048327, 1294048347), class = c(POSIXct, + POSIXt), tzone = ), TTE = c(38, 38, 38, 38, 38, 38)), .Names = c(NAME, + EXPDATE, STRIKE, TMSTMP, PRICE, QUANT, DATE, DTTM, + TTE), row.names = c(1, 2, 3, 4, 5, 6), class = data.frame) VOL - + structure(list(DATE = structure(c(1293993000, 1294079400, 1294165800, + 1294252200, 1294338600, 1294597800), class = c(POSIXct, POSIXt + ), tzone = ), VOL = c(2.32666706461792e-05, 6.79164443640051e-05, + 5.66390788200039e-05, 7.25422438459608e-05, 0.000121727951296865, + 0.000216076713994619)), .Names = c(DATE, VOL), row.names = c(NA, + 6L), class = data.frame) # convert to matrices for faster testing mCALL - cbind(CALL$DATE, CALL$EXPDATE) mVOL - cbind(as.Date(VOL$DATE), VOL$VOL) # convert POSIXct to Date # create 1M rows in mCALL mCALL - rbind(mCALL, mCALL[rep(1L, 1e6),]) result - numeric(nrow(mCALL)) system.time({ + for (i in 1:nrow(mCALL)){ + result[i] - sum(mVOL[(mVOL[, 1L] = mCALL[i,1L]) + (mVOL[, 1L] = mCALL[i, 2L]), 2L]) + } + }) user system elapsed 15.940.00 16.07 On Sun, Sep 9, 2012 at 2:58 PM, Shivam shivamsi...@gmail.com wrote: Dear All, I need to sum a column from another dataframe based on the row values of one dataframe. I am stuck in a loop trying to accomplish it and at current speed it will take more than 80 hours to complete. Needless to say I am looking for a more elegant/quicker solution. Really need some help here. Here is the issue: I have a dataframe CALL (the dput of head is given below) which has close to a million rows. There are 2 date columns which are of importance, DATE and EXPDATE. There is another dataframe, VOL (dput of head given), which has 2 columns, DATE and VOL. It has the volatility corresponding to each day and it has a total of 124 records (corresponding to 6 months). I want to add another column in the CALL dataframe which would contain the sum of all the volatilities from the VOL df for the period specified by the interval of DATE and EXPDATE in each row of CALL df. For ex: In the first row, DATE is '03-01-2011' and EXPDATE is '27-01-2011'. So I want the SUM column (A new column in CALL df) to contain the sum of volatilities of 03-01, 04-01, 05-01 till 27-01 from the VOL dataframe. I have to repeat this process for all the rows in the dataframe. Here is the for-loop version of the solution: for (k in 1:nrow(CALL)){ CALL$SUM[k] = sum(subset(VOL$VOL, VOL$DATE = CALL$DATE[k] VOL$DATE = CALL$EXPDATE[k])) } The loop will run for close to a million times, it has been running for more than 10 hours and its just 12% complete. It would take more than 80 hours to complete, not the mention the toll it would take on my laptop. So is there a better way that I can accomplish this task? Any input would be greatly appreciated. Below are the dput of the two dataframes. One point of note is that there are only 124 DISTINCT values of DATE and 6 DISTINCT values of EXPDATE, in case it can be used in some way. dput(CALL) structure(list(NAME = c(STK, STK, STK, STK, STK, STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029, 15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L, 6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30, 15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3, 101.5, 101.2, 101.2), QUANT = c(1850L, 2000L,
Re: [R] how to save a heatmap.2 in png /jpeg /tiff
On Sep 9, 2012, at 7:04 AM, STADLER Frederic wrote: Hey, I am still working on my heat map (for those who are read my previous post about row.names)∑ Now, I would like to save my heat map.2 in .png or .tiff in order being able to work on the picture in photoshop, but it doesn't work. I'am using (as I have found on some forum) png(heatmap.2.png) # and it just doesn't work. when I try doing it with:: jpeg(heatmap.2.jpeg) # it works once every 10 times, but it's a 22kb file. completely use less !!! Neither of those should have _ever_ worked, since they both are missing closing quotes. Furthermore, just emitting the command jpeg(filename.jpg) even with proper closing quotes will be completely useless, as you say, unless you follow the plot() command with dev.off(). ?Devices ?jpeg # and please DO the examples I really need to have high quality image, as I will have to work on photoshop and also I will have to cut and zoom in just some lines of my heatmap. #here is the code I use for my heatmap.2 : heatmap.2(a_matrix, Rowv=NA, Colv =NA, col=greenred(60), scale=column, margins=c(7,10), trace=none, density.info=c(none)) Does someone know what I have to do in order to get my heatmap.2.png ??? Do I need some other package (I only use gplots, to allow the heatpmap.2) Pleaese include complete code. What you have provided so far should, as you say, be completely useless!!!. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.