Re: [R] Performance (speed) of ggplot
You are asked in the Posting Guide to provide a small, reproducible example. Your function is Byzantine, and depends on who knows what, and I can't even see what your end product is intended to be. I will say that I think you have missed the point with your approach to using ggplot... you might well do better with base graphics if coding each display element is necessary for your application. ggplot is intended for a data-driven approach, where you set up data frames that contain the bulk of your graphic information, and then you should rarely need more than one call for each type of graphic element in a given plot. That said, ggplot can be noticeably slow sometimes, so I cannot predict whether you will achieve your stated speed goal by reconfiguring the code at this point. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On September 25, 2014 11:30:08 AM PDT, Christopher Battles cbatt...@startllc.com wrote: Hello list, I have been working on learning ggplot for its extraordinary flexibility compared to base plotting and have been developing a function to create a Minitab-like process capability chart. *sigh* some of the people I interface with can only understand the data when it is presented in Minitab format The function creates a ggplot container to hold 10 ggplot items which are the main process capability chart, a Q-Q plot, and the text boxes with all the capabilities data. When I run the function, the elapsed time is on the order of 3 seconds, the gross majority of which is user time. sys time is very small. A bit of hacking shows that the calls to gt1 - ggplot_gtable(ggplot_build(p)), etc., each take on the order of 1/3 of a second. These times are on a 3.2GHz Xeon workstation. I'd like to see the entire function complete in less than a second. My questions are: 1) Am I misusing ggplot, hence the performance hit? 2) Is there any way to increase the speed of this portion of the code? 3) Am I simply asking ggplot to crunch so much that it is inevitable that it will take a while to process? To that end, the function, vectis.cap(), can be downloaded from http://pastebin.com/05s5RKYw . It runs to 962 lines of code, so I won't paste it here. The offending ggplot_gtable calls are at lines 909 - 918. Usage: vectis.cap(chickwts$weight, target = 300, USL = 400, LSL = 100) Thank you, Christopher Battles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error in rownames
Hi. Partly. You got response from David regarding HTML formating. I polished scrammbled text but I got AsciiGridImpute(type.rf, xfile, outfile) Error in file(xfiles[[i]], open = rt) : cannot open the connection In addition: Warning message: In file(xfiles[[i]], open = rt) : cannot open file 'sinaspect.asc': No such file or directory So still something is missing. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Chris Jackson-Jordan Sent: Thursday, September 25, 2014 5:57 PM To: Adams, Jean Cc: R help Subject: Re: [R] error in rownames Sorry to respond again so quickly, but is this what you meant? y - subset(training, select = c(ResponseSu)) x - subset(training, select = c(sinaspect, habitat, elevation, disttowat, disttoroad, slope, cosaspect)) type.rf - yai(x=x, y=y, method=randomForest, rfMode=regression, ntree=20) outfile - list(Type=RespSurf_Reg.asc) xfile -list(sinaspect=sinaspect.asc, habitat=habitat.asc, elevation=elevation.asc,+ disttowat=disttowat.asc, disttoroad=disttoroad.asc, slope=slope.asc,+ cosaspect=cosaspect.asc) dput(training[1:60, ])structure(list(CID = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), cosaspect = c(-0.546873, 0.57, 0.796074, -0.906097, 0.546751, -0.903426, 0.55381, 0.761338, 0.858955, -0.784384, 0.790582, 0.0429794, 0.89785, 0.998368, 0.991524, -0.769041, -0.104549, -0.565003, -0.56414, 0.432783, -0.652663, -0.477157, -0.642625, 0.0719997, 0.0202113, -0.88, -0.38204, -0.3565, -0.199093, 0.545715, -0.99322, 0.852507, -0.524166, 0.999378, -0.980175, -0.922242, -0.954118, -0.902187, -0.354786, 0.112082, -0.198151, -0.502453, -0.676458, -0.0673507, -0.999526, -0.840507, 0.989883, -0.832413, -0.733279, -0.993005, -0.479475, -0.326237, -0.848538, -0.546614, -0.986488, -0.948983, -0.993527, -0.964193, -0.971359, -0.287455), disttoroad = c(368.767, 880.661, 563.239, 72.1515, 227.697, 313.486, 265.001, 227.697, 219.137, 518.349, 23.048, 10.3074, 41.2294, 103.074, 497.857, 494.968, 524.968, 674.01, 10.3074, 133.996, 10.3074, 10.3074, 10.3074, 20.6147, 226.058, 10.3074, 272.122, 226.762, 177.933, 204.075, 82.4589, 61.8442, 14.5768, 20.6147, 29.1536, 92.1918, 75.0387, 113.849, 133.996, 46.0959, 123.688, 120.645, 185.533, 133.996, 61.8442, 103.074, 120.203, 154.61, 138.671, 150.077, 51.5368, 190.058, 72.884, 20.6147, 115.24, 30.9221, 83.1006, 333.041, 30.9221, 288.606), disttowat = c(981.15, 1448.98, 729.496, 509.146, 171.549, 322.671, 566.998, 631.783, 783.359, 407.76, 1036.59, 352.264, 306.113, 235.27, 241.289, 269.769, 342.633, 660.476, 578.04, 1844.99, 1753.65, 408.801, 543.951, 756.661, 632.875, 540.031, 927.09, 1120.47, 1166.24, 65.9993, 474.586, 437.911, 162.974, 576.66, 557.932, 547.844, 793.667, 420.586, 208.708, 10.3074, 497.857, 383.595, 371.78, 130.379, 449.406, 111.014, 162.974, 393.574, 463.831, 360.021, 404.096, 43.7304, 140.195, 182.647, 277.343, 84.9967, 87.4609, 633.546, 60.1017, 42.4983), elevation = c(2000.9, 2198.18, 2194.78, 1863.8, 1902.25, 1867.29, 1943.94, 1912.35, 1920.11, 1870.94, 1914.56, 1904.4, 1900.92, 1893.94, 1888.02, 1889.71, 1890.95, 1894.32, 1896.11, 2011.99, 2025.01, 1902.45, 1901.99, 2030.09, 2038.39, 1910.48, 2054.46, 1918.92, 1951.15, 1941.24, 1853.78, 1863.76, 1875.9, 1863.6, 1864.91, 1864.18, 1872.85, 1866.81, 1863.06, 1896.47, 1863.95, 1873.13, 1873.44, 1888.06, 1874.35, 1888.06, 1911.09, 1872.43, 1854.83, 1866.98, 1876.95, 1875.88, 1875.19, 1888.27, 1900.82, 1870.49, 1903.17, 1871.52, 1902.28, 1862.18), habitat = c(14L, 1L, 9L, 2L, 5L, 2L, 5L, 1L, 1L, 2L, 1L, 4L, 10L, 8L, 2L, 2L, 2L, 2L, 4L, 10L, 5L, 4L, 4L, 3L, 5L, 4L, 3L, 3L, 7L, 6L, 2L, 2L, 2L, 13L, 2L, 2L, 2L, 2L, 2L, 11L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 10L, 2L, 2L, 2L, 2L, 2L), sinaspect = c(-0.837215, -0.942801, -0.6052, 0.42307, -0.837295, -0.428745, -0.832643, -0.648355, -0.512051, -0.620276, -0.612357, -0.999076, -0.440302, -0.0571153, 0.129924, -0.6392, -0.99452, -0.825089, 0.825679, -0.901498, -0.757648, 0.878818, 0.766181, -0.997405, -0.999796, 0.895785, 0.924146, 0.934295, 0.979981, -0.837971, 0.11625, -0.522717, -0.851616, -0.0352723, 0.198133, -0.386614, -0.29943, -0.431345, -0.934948, -0.993699, -0.980172, -0.864605, -0.736481, 0.997729, -0.0307994, -0.541801, 0.141883, -0.554156, 0.679928, 0.118074, -0.877556, -0.945288, -0.529134, -0.837385, -0.163837, -0.315329, -0.113595, 0.265202, -0.237618, -0.957794), slope = c(28.8435, 2.74077, 34.8524, 0.151366, 18.6671, 0.132943, 17.6102, 16.0996, 9.36435, 0.452752, 1.9657, 0.996516, 1.74535, 0.148506, 0.0372119, 0.695029, 0.455934, 0.514605, 12.6356, 16.0089, 8.21627, 1.038, 0.498378, 3.6306, 4.11395, 10.3384, 15.5258, 16.5868, 30.8142,
Re: [R] Median of streaming data
Rolf Turner r.tur...@auckland.ac.nz on Thu, 25 Sep 2014 11:44:38 +1200 writes: On 24/09/14 20:16, Martin Maechler wrote: SNIP 1) has your proposal ever been provided in R? I'd be happy to add it to the robustX (http://cran.ch.r-project.org/web/packages/robustX) or even robustbase (http://cran.ch.r-project.org/web/packages/robustbase) package. SNIP I have coded up the algorithm from the Cameron and Turner paper. Dunno if it gives exactly the same results as my (Splus?) code from lo these many years ago (the code that is lost in the mists of time), but it *seems* to work. excellent, thank you, Rolf! It is not designed to work with actual streaming data --- I don't know how to do that. It takes a complete data vector as input. Someone who knows about streaming data should be able to adapt it pretty easily. Said he, the proverbial optimist. I agree; that should not be hard. One way is to replace 'y[ind]' by 'getY(ind)' everywhere in the code and let 'getY' be an argument to rlas() provided by the user. The function code and a help file are attached. These files have had their names changed to end in .txt so that they will get through the mailing list processor without being stripped. With a bit of luck. ;-) It did work indeed. I've added them to 'robustX' -- on R-forge, including a plot() method and some little more flexibility. -- https://r-forge.r-project.org/R/?group_id=59 Thank you for all the other pointers to litterature (but none to software), some of which quite recent. One old idea that was not directly mentioned I think is the Remedian of Rouseeuw and Basset: Peter J. Rousseeuw and Gilbert W. Bassett, Jr. (1998) The Remedian: A Robust Averaging Method for Large Data Sets Journal of the American Statistical Association, Vol. 85, No. 409, pp. 97-104 [URL: http://www.jstor.org/stable/2289530] which is also easy to implement and I plan to add to robustbase (as I'd want to use the C code already in robustbase) as a reference estimator. Personally, I think there is quite some room for research and implementation, not the least because the litterature seems to always be a bit incomplete {one school not knowning about, or at least not citing works of the other school, etc...} Martin -- Martin Maechler, ETH Zurich If they *don't* get through, anyone who is interested should contact me and I will send them to you privately. cheers, Rolf -- Rolf Turner Technical Editor ANZJS external: rlas.R, plain text] external: rlas.Rd, plain text] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Performance (speed) of ggplot
Hi I will second Jeff. If you want several plots on one page it could be more convenient to use standard plot. see ?layout or ?split.screen for complex figures or ?par mfrow for simple layout. In all cases it is difficult to combine base and grid graphics though. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Jeff Newmiller Sent: Friday, September 26, 2014 8:02 AM To: Christopher Battles; r-help@r-project.org Subject: Re: [R] Performance (speed) of ggplot You are asked in the Posting Guide to provide a small, reproducible example. Your function is Byzantine, and depends on who knows what, and I can't even see what your end product is intended to be. I will say that I think you have missed the point with your approach to using ggplot... you might well do better with base graphics if coding each display element is necessary for your application. ggplot is intended for a data-driven approach, where you set up data frames that contain the bulk of your graphic information, and then you should rarely need more than one call for each type of graphic element in a given plot. That said, ggplot can be noticeably slow sometimes, so I cannot predict whether you will achieve your stated speed goal by reconfiguring the code at this point. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On September 25, 2014 11:30:08 AM PDT, Christopher Battles cbatt...@startllc.com wrote: Hello list, I have been working on learning ggplot for its extraordinary flexibility compared to base plotting and have been developing a function to create a Minitab-like process capability chart. *sigh* some of the people I interface with can only understand the data when it is presented in Minitab format The function creates a ggplot container to hold 10 ggplot items which are the main process capability chart, a Q-Q plot, and the text boxes with all the capabilities data. When I run the function, the elapsed time is on the order of 3 seconds, the gross majority of which is user time. sys time is very small. A bit of hacking shows that the calls to gt1 - ggplot_gtable(ggplot_build(p)), etc., each take on the order of 1/3 of a second. These times are on a 3.2GHz Xeon workstation. I'd like to see the entire function complete in less than a second. My questions are: 1) Am I misusing ggplot, hence the performance hit? 2) Is there any way to increase the speed of this portion of the code? 3) Am I simply asking ggplot to crunch so much that it is inevitable that it will take a while to process? To that end, the function, vectis.cap(), can be downloaded from http://pastebin.com/05s5RKYw . It runs to 962 lines of code, so I won't paste it here. The offending ggplot_gtable calls are at lines 909 - 918. Usage: vectis.cap(chickwts$weight, target = 300, USL = 400, LSL = 100) Thank you, Christopher Battles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že
Re: [R] Performance (speed) of ggplot
You are using ggplot2 very inefficiently. Many geom's plot only one data point. You can combine several of them in a single geom. Have a look at this gridExtra package which has some useful functions like grid.arrange and tableGrob. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Christopher Battles Verzonden: donderdag 25 september 2014 20:30 Aan: r-help@r-project.org Onderwerp: [R] Performance (speed) of ggplot Hello list, I have been working on learning ggplot for its extraordinary flexibility compared to base plotting and have been developing a function to create a Minitab-like process capability chart. *sigh* some of the people I interface with can only understand the data when it is presented in Minitab format The function creates a ggplot container to hold 10 ggplot items which are the main process capability chart, a Q-Q plot, and the text boxes with all the capabilities data. When I run the function, the elapsed time is on the order of 3 seconds, the gross majority of which is user time. sys time is very small. A bit of hacking shows that the calls to gt1 - ggplot_gtable(ggplot_build(p)), etc., each take on the order of 1/3 of a second. These times are on a 3.2GHz Xeon workstation. I'd like to see the entire function complete in less than a second. My questions are: 1) Am I misusing ggplot, hence the performance hit? 2) Is there any way to increase the speed of this portion of the code? 3) Am I simply asking ggplot to crunch so much that it is inevitable that it will take a while to process? To that end, the function, vectis.cap(), can be downloaded from http://pastebin.com/05s5RKYw . It runs to 962 lines of code, so I won't paste it here. The offending ggplot_gtable calls are at lines 909 - 918. Usage: vectis.cap(chickwts$weight, target = 300, USL = 400, LSL = 100) Thank you, Christopher Battles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can't seem to get lapply to work.
Jeff, Of course a data.frame is a list of columns. Duh! Sorry about that. Was working about 10 hours straight. Also sorry about the excess attributes. They were created by via expand.grid. Here is my current solution which seems to be working: lapply(1:nrow(tstSet), function(x, y, z) oneRun(y[x,],z), tstSet, inp) Thanks so much for your timely help, Best, KW -- On Sep 26, 2014, at 12:23 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: On suggestion is to tell us what you want to accomplish, because that error makes perfect sense to me and your intent is rather opaque. Data frames are not lists of rows, they are lists of columns. One solution could be tstSet[[term]] + 1 Another might be tstSet - within(tstSet,{term - term +1}) --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On September 25, 2014 8:25:44 PM PDT, Keith S Weintraub kw1...@gmail.com wrote: Folks, I have the following problem. tstSet-structure(list(corr = c(0.59, 0.62), term = c(7, 7), am = c(AmYes, AmNo), prem = c(19.5, 14.75)), .Names = c(corr, term, am, prem), out.attrs = structure(list(dim = structure(c(3L, 2L, 2L, 41L), .Names = c(corr, term, am, prem)), dimnames = structure(list( corr = c(corr=0.59, corr=0.62, corr=0.65), term = c(term=5, term=7), am = c(am=AmYes, am=AmNo), prem = c(prem=10.00, prem=10.25, prem=10.50, prem=10.75, prem=11.00, prem=11.25, prem=11.50, prem=11.75, prem=12.00, prem=12.25, prem=12.50, prem=12.75, prem=13.00, prem=13.25, prem=13.50, prem=13.75, prem=14.00, prem=14.25, prem=14.50, prem=14.75, prem=15.00, prem=15.25, prem=15.50, prem=15.75, prem=16.00, prem=16.25, prem=16.50, prem=16.75, prem=17.00, prem=17.25, prem=17.50, prem=17.75, prem=18.00, prem=18.25, prem=18.50, prem=18.75, prem=19.00, prem=19.25, prem=19.50, prem=19.75, prem=20.00 )), .Names = c(corr, term, am, prem))), .Names = c(dim, dimnames)), row.names = c(460L, 239L), class = data.frame) tstSet corr termam prem 460 0.597 AmYes 19.50 239 0.627 AmNo 14.75 A function: twoRun-function(aRow, inp) { aRow[[term]] + inp } When I try the following: lapply(tstSet, twoRun, 1) I get the following error: Error in aRow[[term]] : subscript out of bounds Any suggestions? Thanks much for your time, KW -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with conversion of variable labels (Hmisc and Epicalc)
For my epidemiological analysis I use the packages Epicalc and Hmisc – among others. Both packages allow to assign variable labels that will appear in the output of the respective packages’ own functions. The modes of storage of the labels in a dataframe are very different. I am wonderiung if there is a function that would allow to easily convert Epicalc labels to Hmisc labels and possibly the other way around. Unfortunately I am not adept enough in R to write such a function myself. -- Otto Pichlhöfer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Savitzky-Golay Smoother
Dear Paul and Henrik, I have a time series with some missing data points that i need smoothed using Savitzky-Golay filter. Related question was asked here http://thr3ads.net/r-help/2012/11/2121748-Savitzky-Golay-filtering-with-missing-data but no straight forward answer was posted. However, Henrik (cc'd here) did ask related question on smoothing for reflectance here http://thr3ads.net/r-help/2004/02/835137-Savitzky-Golay-smoothing-for-reflectance-data which i have as well been unable to follow up. I will be glad if you could assist me with some insights on the way forward or point to a relevant source of help. I have attached a simple test data plus the R code which only work with non missing data. I will appreciate any possible assistance. Thanks, Erick. -- *Erick Okuto, Ph.D.* * Candidate* *School of Mathematics Actuarial Science* *Jaramogi Oginga Odinga University of Science Technology (JOOUST)/ World Agroforestry Centre (ICRAF), **Climate Change Unit, Nairobi-Kenya. * Voice: +254207224154Mobile: +254725005276Skype id: erickokuto Email: erickok...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [rJava] RJavaClassLoader and system classloader
Hi everyone, I want to call a Java application from R and have encountered some problems with the way rJava deals with the system class loader. To run my application, I use the following R script: library(rJava) .jinit() .jaddClassPath(myApp.jar) rWrapper - .jnew(org/test/RWrapper) .jcall(rWrapper,V,start) My Java application has a plugins loading mechanism that uses a specific PluginClassLoader to load plugins stored in additional JAR files (e.g. plugin.jar). This PluginClassLoader is programed so that it knows and loads the plugins JARs. As any Java classloader, it is a child of the system class loader. Finally, plugins not only use classes stored in their plugin.jar file, they also depend on classes contained in the main myApp.jar file (the Plugin interface is for example defined in the myApp.far) In a pure Java environment, myApp.jar is known from the system class loader (it is in the classpath) and thus the PluginClassLoader can load classes from the plugin (it knows both about the plugin.jar and the myApp.jar files) In the R context however, using the above script, its the RJavaClassLoader that knows about the myApp.jar file. The org.test.RWrapper class is instatiated by the RJavaClassLoader. The system class loader however does not know about myApp.jar anymore. Neither does the PluginClassLoader. So when the PluginClassLoader loads a plugin class, it can only load classes that are in the plugin.jar file and as soon as a class from the myApp.jar file is required by the plugin, it crashes with a java.lang.NoClassDefFoundError message. My question is: how can I force rJava to load the classpath in the system classloader and not only in the RJavaClassLoader? I cannot make the PluginClassLoader know the RJavaClassLoader as my application also has to run in a non-R environment. Thanks for your time, Kind regards, Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with conversion of variable labels (Hmisc and Epicalc)
On Sep 26, 2014, at 3:21 AM, Otto Pichlhöfer wrote: For my epidemiological analysis I use the packages Epicalc and Hmisc – among others. Both packages allow to assign variable labels that will appear in the output of the respective packages’ own functions. The modes of storage of the labels in a dataframe are very different. I am wonderiung if there is a function that would allow to easily convert Epicalc labels to Hmisc labels and possibly the other way around. Unfortunately I am not adept enough in R to write such a function myself. You needt to use the attributes function to display hte names and structures of the labels in the two packages. Firts look at help(pack=Hmisc) and help(pack=Epicalc) to see how the package-specific functions are documented. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] package loadable in R 3.1.1 Rterm but not in emacs/ESS
I'm running R on Windows 7. Clean install on a brand new computer yesterday. I installed Protext then R then Vincent Goulet's emacs with ESS, in that order. I then installed some R packages, in the R terminal window. Among them was car Today I opened emacs, hit M-x R to start an R session, and tried to load the car library. Results looked like the following: R version 3.1.1 (2014-07-10) -- Sock it to Me Copyright (C) 2014 The R Foundation for Statistical Computing Platform: i386-w64-mingw32/i386 (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. options(chmhelp=FALSE, help_type=text) options(STERM='iESS', str.dendrogram.last=', editor='emacsclient.exe', show.error.locations=TRUE) library(car) Error in library(car) : there is no package called 'car' However, in an R terminal (that is, outside of emacs) the car package loads fine: R version 3.1.1 (2014-07-10) -- Sock it to Me Copyright (C) 2014 The R Foundation for Statistical Computing Platform: i386-w64-mingw32/i386 (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. library(car) I have not encountered this before and am confused. Why would R 3.1.1 in its terminal see a library, whereas it would not in emacs? (car is just an example; the same thing happens with zoo, stringr, Hmisc, and others.) Thanks. --Chris Ryan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question regarding pcalg package
Hi, I am trying to use the pcalg package to do some causal inference on some high dimensional omics data (~ 30k variables). It takes forever to run and my machine get stuck. I just realized that conditional independence test can be calculated without using the correlation matrix (I think it's just looping through all the partial correlation calculation), I am wondering after getting all the edges (maybe in a format of two column matrix with source and targets in each one), then how to use pc, fci, rfci, etc to do the causal inference. Right now it seems that we need to supply a correlation matrix into those algorithms in the sufficient statistics argument. Thanks, -Jack [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Savitzky-Golay Smoother
You can perform the equivalent filtering by hand using something like: k = (window_length - 1) / 2 for i in k to len(x) - k xh,yh = x[points-k:points+k], y[points-k:points+k] idx = isfinite(xh) and isfinite(yh) xh,yh = xh[idx],yh[idx] p = polyfit(xh,yh, degree=filter_order) filtered_y[i] = polyval(p,x[i]) Points at the beginning use xh,yh from 1 to window_length, and at the end from n-window_length to n. Savitsky-Golay uses the fact that the x points are equally spaced to precompute the weights on an FIR filter, so that filtered_y[i] = yh dot filter_coefficients The FIR filter coefficients for order p on window 2k+1 are determined from the top row of the pseudo inverse of the overdetermined polynomial equation C whose elements, Cij are (-k+i)^(p-j) for i in 0 to 2k, j in 0 to p. For an order 2 solver with 5 points, this would be: 4 -2 1 1 -1 1 0 0 1 1 1 1 4 2 1 Computing the pseudo-inverse, this gives filter coefficients: [ 0.14285714, -0.07142857, -0.14285714, -0.07142857, 0.14285714] If the central point were NA, the coefficients would have to be: [ 0.1667, -0.1667, NA, -0.1667, 0.1667] Sliding the window forward one, the coefficients become: [ 0.11363636, NA, -0.18181818, -0.09090909, 0.15909091] Clearly, there is not going to be a trivial operation on the data stream or filter coefficients which will handle the missing data. You have a few options. (1) if your data set is small, just do the filter manually. (2) if your data set is large but the missing data is sparse, replace missing data by infinity, then run the filter as normal. For each nonfinite number in the result, compute the filtered value manually. (3) if you are not too concerned about full precision (this is, after all, a smoothing operation), you could try replacing each missing value with the average of the surrounding values, and hopefully this will not disturb the polynomial approximations for the surrounding points too much. - Paul On Fri, Sep 26, 2014 at 3:32 AM, Erick Okuto erickok...@gmail.com wrote: Dear Paul and Henrik, I have a time series with some missing data points that i need smoothed using Savitzky-Golay filter. Related question was asked here http://thr3ads.net/r-help/2012/11/2121748-Savitzky-Golay-filtering-with-missing-data but no straight forward answer was posted. However, Henrik (cc'd here) did ask related question on smoothing for reflectance here http://thr3ads.net/r-help/2004/02/835137-Savitzky-Golay-smoothing-for-reflectance-data which i have as well been unable to follow up. I will be glad if you could assist me with some insights on the way forward or point to a relevant source of help. I have attached a simple test data plus the R code which only work with non missing data. I will appreciate any possible assistance. Thanks, Erick. -- *Erick Okuto, Ph.D.* * Candidate* *School of Mathematics Actuarial Science* *Jaramogi Oginga Odinga University of Science Technology (JOOUST)/ World Agroforestry Centre (ICRAF), **Climate Change Unit, Nairobi-Kenya. * Voice: +254207224154Mobile: +254725005276Skype id: erickokuto Email: erickok...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Histogram from a single column of a data frame
Column 7 of oded is Breed. If I enter summary(Breed) I get the counts of the numbers in each breed. However, if I enter I have tried hist($Breed) Error: unexpected '$' in hist($ hist(Breed) Error in hist(Breed) : object 'Breed' not found hist(Breed) Error in hist.default(Breed) : 'x' must be numeric FYI colnames(oded) [1] Subject.Name Date Species AgeSex [6] SpayedNeutered Breed Breed.Code How do I get a histogram of the counts in column 7? Thanks Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with conversion of variable labels (Hmisc and Epicalc)
On Sep 26, 2014, at 4:21 AM, Otto Pichlhöfer wrote: For my epidemiological analysis I use the packages Epicalc and Hmisc – among others. Both packages allow to assign variable labels that will appear in the output of the respective packages’ own functions. The modes of storage of the labels in a dataframe are very different. I am wonderiung if there is a function that would allow to easily convert Epicalc labels to Hmisc labels and possibly the other way around. Unfortunately I am not adept enough in R to write such a function myself. Here's some further explorations on the topic. You should be able to see that epicalc (not Epicalc) uses labeling at a dataframe levels of attribute assignment while rms/Hmisc uses a column-level attribute creation. So you could actually have both systems working in the same dataframe. (Which could get pretty confusing if your short-term memory is as limited as mine.) require(epicalc) require(rms) # which loads Hmisc # This is the first section of the examples in sbp - c(120, 100, 110, 120, 140, 120, NA, NA) dbp - c( 80, 80, 70, 80, 70, NA, 70, 60) .data - data.frame(sbp, dbp) use(.data) I'm guessing that this `use`-function is sort of like R's attach function I strongly recommend against using `attach` and I suspect also against using `use` pack() # I have no idea what that does. des() # or that No. of observations = 8 Variable Class Description 1 sbp numeric 2 dbp numeric label.var(sbp, systolic BP) str(.data$sbp) num [1:8] 120 100 110 120 140 120 NA NA str(.data) 'data.frame': 8 obs. of 2 variables: $ sbp: num 120 100 110 120 140 120 NA NA $ dbp: num 80 80 70 80 70 NA 70 60 - attr(*, var.labels)= chr systolic BP That shows that the attributes are assigned to the dataframe by epicalc's `var.labels` ?label label(.data$sbp) - test.sbp str(.data) 'data.frame': 8 obs. of 2 variables: $ sbp:Classes 'labelled', 'numeric' atomic [1:8] 120 100 110 120 140 120 NA NA .. ..- attr(*, label)= chr test.sbp $ dbp: num 80 80 70 80 70 NA 70 60 - attr(*, var.labels)= chr systolic BP And that shows that `Hmisc::label` does its assignment to the individual column vector. If you wanted to assign Hmisc-type labels to the columns of a dataframe that has some or all of its columns lableded in the epicalc method then this loop succeeds: for( coln in seq_along(.data) ) { label(.data[[coln]]) - attr(.data, var.labels)[coln]} str(.data) 'data.frame': 8 obs. of 2 variables: $ sbp:Classes 'labelled', 'numeric' atomic [1:8] 120 100 110 120 140 120 NA NA .. ..- attr(*, label)= chr systolic BP $ dbp:Classes 'labelled', 'numeric' atomic [1:8] 80 80 70 80 70 NA 70 60 .. ..- attr(*, label)= chr - attr(*, var.labels)= chr systolic BP -- David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Histogram from a single column of a data frame
Try hist(oded$Breed) (I suspect that summary(Breed) does not work in your current session either - perhaps you had a dataset named just Breed or had attached the data.frame oded in the session where summary(Breed) works.) Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Sep 26, 2014 at 10:57 AM, Richard Lerner richa...@uchicago.edu wrote: Column 7 of oded is Breed. If I enter summary(Breed) I get the counts of the numbers in each breed. However, if I enter I have tried hist($Breed) Error: unexpected '$' in hist($ hist(Breed) Error in hist(Breed) : object 'Breed' not found hist(Breed) Error in hist.default(Breed) : 'x' must be numeric FYI colnames(oded) [1] Subject.Name Date Species AgeSex [6] SpayedNeutered Breed Breed.Code How do I get a histogram of the counts in column 7? Thanks Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Histogram from a single column of a data frame
On 27/09/14 05:57, Richard Lerner wrote: Column 7 of oded is Breed. If I enter summary(Breed) I get the counts of the numbers in each breed. However, if I enter I have tried hist($Breed) Error: unexpected '$' in hist($ hist(Breed) Error in hist(Breed) : object 'Breed' not found hist(Breed) Error in hist.default(Breed) : 'x' must be numeric FYI colnames(oded) [1] Subject.Name Date Species AgeSex [6] SpayedNeutered Breed Breed.Code How do I get a histogram of the counts in column 7? Are you for real, or is this some kind of joke? Assuming it's not a joke: (1) Look at ?$. (2) WTF do you expect hist($Breed) to do? Where is Breed going to be found, for pity's sake? You appear to have no understanding of how R (or any programming language) works. Read the Introduction to R manual and *get* some understanding if you are going to use R. (3) The correct usage obviously (and I mean ***really***; it *IS* obvious) is hist(oded$Breed) --- which tells hist() where to find Breed. (4) You could also use with(oded,hist(Breed)) which is a useful syntax in some circumstances. cheers, Rolf Turner -- Rolf Turner Technical Editor ANZJS __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Median of streaming data
On 26/09/14 21:48, Martin Maechler wrote: Rolf Turner r.tur...@auckland.ac.nz SNIP I have coded up the algorithm from the Cameron and Turner paper. Dunno if it gives exactly the same results as my (Splus?) code from lo these many years ago (the code that is lost in the mists of time), but it *seems* to work. excellent, thank you, Rolf! It is not designed to work with actual streaming data --- I don't know how to do that. It takes a complete data vector as input. Someone who knows about streaming data should be able to adapt it pretty easily. Said he, the proverbial optimist. I agree; that should not be hard. One way is to replace 'y[ind]' by 'getY(ind)' everywhere in the code and let 'getY' be an argument to rlas() provided by the user. The function code and a help file are attached. These files have had their names changed to end in .txt so that they will get through the mailing list processor without being stripped. With a bit of luck. ;-) It did work indeed. I've added them to 'robustX' -- on R-forge, including a plot() method and some little more flexibility. -- https://r-forge.r-project.org/R/?group_id=59 SNIP Since I posted my previous email I have found some typos in the documentation and made some adjustments to the code. I also realized that the name rlas sounds a bit too much like a random number generator, according to R's conventions. So I have decided that lasr would be a better name. I hope this change of horses in midstream doesn't mess things up too much for you. If it does, of course feel free to stick with rlas. I have attached revised *.R and *.Rd files. Not sure that the *.Rd file will get through as-is. Please let me know if it doesn't. cheers, Rolf -- Rolf Turner Technical Editor ANZJS lasr - function(y,b=0.2,mfn=function(n){0.1*n^(-0.25)}, nstart=30,scon=NULL) { # Initialize: y0- y[1:nstart] alpha - median(y0) s - if(is.null(scon)) mean(abs(y0-alpha)) else scon mu- mfn(nstart)/s eps - s/nstart^b kount - sum(abs(alpha-y0) eps) g - kount/(eps*nstart) ny- length(y) n - nstart+1 locn - numeric(ny) locn[1:(nstart-1)] - NA locn[nstart] - alpha scale - numeric(ny) scale[1:(nstart-1)] - if(is.null(scon)) NA else scon scale[nstart] - s # Calculate recursively: while(n = ny) { s - if(is.null(scon)) ((n-1)*s + abs(y[n]-alpha))/n else scon mu- mfn(n)/s eye - if(abs(alpha - y[n]) s/n^b) 1 else 0 g - (1-1/n)*g + n^(b-1)*eye/s a - max(mu,g) alpha - alpha + sign(y[n]-alpha)/(a*n) locn[n] - alpha scale[n] - s n - n+1 } list(locn=locn,scale=scale) } \name{lasr} \alias{lasr} \title{ Recursive location and scale. } \description{ Calculate an estimate of location, asymptotically equivalent to the median, and an estimate of scale equal to the \bold{MEAN} absolute deviation. Both done recursively. } \usage{ lasr(y, b = 0.2, mfn = function(n){0.1 * n^(-0.25)}, nstart = 30, scon = NULL) } \arguments{ \item{y}{ A numeric vector of i.i.d. data whose location and scale parameters are to be estimated. } \item{b}{ A tuning parameter (default value equal to that used by Holst, 1987). } \item{mfn}{ Function of the index of the data which must be positive and tend to 0 as the index tends to infinity. The default function is that used by Holst, 1987. } \item{nstart}{ Starting values for the algorithm are formed from the first \code{nstart} values of \code{y}. The default value is that used in Cameron and Turner, 1993. } \item{scon}{ A constant value for the scale parameter \code{s}. If this is provided then the algorithm becomes equivalent to the algorithm of Holst, 1987. } } \value{ A list with entries \item{locn}{The successive recursive estimates of location. The first \code{nstart - 1} of these are \code{NA}.} \item{scale}{The successive recursive estimates of scale. If \code{scon} is specified these are all equal to \code{scon}. Otherwise the first \code{nstart - 1} values are \code{NA}.} } \section{Remark}{ The \bold{mean} absolute deviation is used as an estimate of scale (rather than the more expected \bold{median} absolute deviation) simply because the former can be calculated recursively. } \references{ Cameron, Murray A. and Turner, T. Rolf (1993). Recursive location and scale estimators. \emph{Commun. Statist. --- Theory Meth.} \bold{22} (9) 2503--2515. Holst, U. (1987). Recursive estimators of location. \emph{Commun. Statist. --- Theory Meth.} \bold{16} (8) 2201--2226. } \author{ \email{r.tur...@auckland.ac.nz} \url{http://www.stat.auckland.ac.nz/~rolf} } \examples{ set.seed(42) y - rnorm(1) z1 - lasr(y) z2 - lasr(y,scon=1) z3 - lasr(y,scon=100) OP - par(mfrow=c(3,1)) plot(z1$locn,type=l) abline(h=median(y),col=red) plot(z2$locn,type=l) abline(h=median(y),col=red) plot(z3$locn,type=l)
Re: [R] Savitzky-Golay Smoother
On Fri, Sep 26, 2014 at 3:32 AM, Erick Okuto erickok...@gmail.com wrote: Dear Paul and Henrik, I have a time series with some missing data points that i need smoothed using Savitzky-Golay filter. Related question was asked here http://thr3ads.net/r-help/2012/11/2121748-Savitzky-Golay-filtering-with-missing-data but no straight forward answer was posted. However, Henrik (cc'd here) did ask related question on smoothing for reflectance here http://thr3ads.net/r-help/2004/02/835137-Savitzky-Golay-smoothing-for-reflectance-data which i have as well been unable to follow up. I will be glad if you could assist me with some insights on the way forward or point to a relevant source of help. Not Savitzky-Golay but if z is a time series then library(zoo) na.spline(z) will fill in NAs with spline curve fits. See ?na.spline There are other NA filling routines in zoo too: ls(pattern = ^na[.], package:zoo) [1] na.aggregate na.aggregate.default na.approx [4] na.approx.defaultna.fill na.fill.default [7] na.locf na.locf.default na.spline [10] na.spline.defaultna.StructTS na.trim [13] na.trim.default na.trim.ts -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] quadprog::solve.QP sometimes returns NaNs
Hello, Here is an example; hopefully it is reproducible on others' platform: library(quadprog) n - 66L set.seed(6860) X - matrix(1e-20, n, n) diag(X) - 1 Dmat - crossprod(X) y - seq_len(n) dvec - crossprod(X, y) Amat - diag(n) bvec - y + runif(n) sol - solve.QP(Dmat, dvec, Amat, bvec, meq = n) print(sol$solution) # this gives all NaNs under sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] quadprog_1.5-5 Any ideas? Thanks Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.