Re: [R] COVID-19 datasets...
Sure. COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University is available here: https://github.com/CSSEGISandData/COVID-19 All in csv fiormat. > On May 4, 2020, at 11:31 AM, Bernard McGarvey > wrote: > > Just curious does anyone know of a website that has data available in a > format that R can download and analyze? > > Thanks > > > Bernard McGarvey > > > Director, Fort Myers Beach Lions Foundation, Inc. > > > Retired (Lilly Engineering Fellow). > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > James Spottiswoode Applied Mathematics & Statistics (310) 270 6220 jamesspottiswoode Skype ja...@jsasoc.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to parallelize a process called by a socket connection
Hi R Experts, I’m using R version 3.4.3 running under Linux on an AWS EC2 instance. I have an R code listening on a port for a socket connection which passes incoming data to a function the results of which are then passed back to the calling machine. Here’s the function that listens for a socket connection: # define server function server <- function() { while(TRUE){ con <- socketConnection(host="localhost", port = server_port, blocking=TRUE, server=TRUE, open="r+", timeout = 1) data <- readLines(con, 1L, skipNul = T, ok = T) response <- check(data) if (!is.null(response)) writeLines(response, con) } } The server function expects to receive a character string which is then passed to the function check(). check() is a large, complex routine which does text analysis and many other things and returns a JSON string to be passed back to the calling machine. This all works perfectly except that while check() spends ~50ms doing its stuff no more requests can be received and processed. Therefore if a new request comes in sooner than ~50ms after the last one, it is not processed. I would therefore like to parallelize this so that the box can be running more than one check() process simulatanously. I’m familar with several of the paralyzing R packages but I cannot see how to integrate them with the socket connection side of things. Currently I have a kludge which is a round-robin approach to solving the problem. I have 4 versions of the whole R code listening on 4 different ports, say P1, P2, P3, P4, and the calling machine issues calls in sequence to ports P1,P2,P3,P4,P1… etc. This mitigates, but doesn’t solve, the problem. Any advice would be greatly appreciated! Thanks. James __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] File names for mac newby
OSX is based on BSD UNIX so paths use the forward slash as separator, e.g. temps <- read.table("c:/Users/DFP/Documents/ah/house/HouseTemps.txt",header=T,row.names=1) Best James > On Jan 21, 2020, at 9:20 AM, David wrote: > > I moved to a mac a few months ago after years in windows, and I'm still > learning basics. I'm wanting to create a data frame based on a text file > called HouseTemps.txt. That's a file within one called house which is within > one called ah. That may further be in one called Documents. I tried > various lines like: > > temps <- > read.table("c:\\Users\\DFP\\Documents\\ah\\house\\HouseTemps.txt",header=T,row.names=1) > > based on my windows DOS experience, but nothing I try works. So my question > is, what do complete file names look like in a mac? > > I tried Apple support, but they couldn't help me with R. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parameters optimization in r
Hi, I’ve often come across this problem and have found genetic algorithms (GA) to be extremely useful. I wrote my first GA code in the 80’s and have extensive experience with the method. The package rgenoud is a very full featured GA implementation. Just code up your parameters as arguments to the function giving your method, random forests or whatever, then define a target variable for performance or fitness such as AUC or R^2, whatever is appropriate, and let the GA climb to the top of the fitness landscape. If you have a large problem you may want to speed things up by using parallel processes across cores or machines. Rgenoud handles that well. Good luck! James > On Oct 11, 2019, at 4:21 PM, javed khan wrote: > > Hi > > I will appreciate if someone provide the link to some tutorials/videos > where parameters running are performed in R. For instance, if we have to > perform predictions/classification using random forest or other algorithm, > how different optimization algorithms tune the parameters of random forest > such as numbers of trees etc. > > Best regards > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem parallelizing across cores
> On Aug 28, 2019, at 4:44 PM, James Spottiswoode wrote: > > Hi Bert, > > Thanks for your advice. Actually i’ve already done this and have checked out > doParallel and future packages. The trouble with doParallel is that it forks > R processes which spend a lot of time loading data and packages whereas my > function runs in 100ms so the parallelization doesn’t help. The future > package keeps it’s children running but I haven’t figured out how to get it > to work in my application. > > Best — James > > >> On Aug 28, 2019, at 3:39 PM, Bert Gunter > <mailto:bgunter.4...@gmail.com>> wrote: >> >> >> I would suggest that that you search on "parallel computing" at the >> Rseek.org <http://rseek.org/> site. This brought up what seemed to be many >> relevant hits including, of course, the High Performance and parallel >> Computing Cran task view. >> >> Cheers, >> Bert >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along and >> sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Wed, Aug 28, 2019 at 3:18 PM James Spottiswoode >> mailto:james.spottiswo...@gmail.com>> wrote: >> Hi All, >> >> I have a piece of well optimized R code for doing text analysis running >> under Linux on an AWS instance. The code first loads a number of packages >> and some needed data and the actual analysis is done by a function called, >> say, f(string). I would like to parallelize calling this function across >> the 8 cores of the instance to increase throughput. I have looked at the >> packages doParallel and future but am not clear how to do this. Any method >> that brings up an R instance when the function is called will not work for >> me as the time to load the packages and data is comparable to the execution >> time of the function leading to no speed up. Therefore I need to keep a >> number of instances of the R code running continuously so that the data >> loading only occurs once when the R processes are first started and >> thereafter the function f(string) is ready to run in each instance. I hope >> I have put this clearly. >> >> I’d much appreciate any suggestions. Thanks in advance, >> >> James Spottiswoode >> >> >> -- >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To >> UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> <http://www.r-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. > > James Spottiswoode > Applied Mathematics & Statistics > (310) 270 6220 > jamesspottiswoode Skype > ja...@jsasoc.com <mailto:ja...@jsasoc.com> James Spottiswoode Applied Mathematics & Statistics (310) 270 6220 jamesspottiswoode Skype ja...@jsasoc.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem parallelizing across cores
Hi All, I have a piece of well optimized R code for doing text analysis running under Linux on an AWS instance. The code first loads a number of packages and some needed data and the actual analysis is done by a function called, say, f(string). I would like to parallelize calling this function across the 8 cores of the instance to increase throughput. I have looked at the packages doParallel and future but am not clear how to do this. Any method that brings up an R instance when the function is called will not work for me as the time to load the packages and data is comparable to the execution time of the function leading to no speed up. Therefore I need to keep a number of instances of the R code running continuously so that the data loading only occurs once when the R processes are first started and thereafter the function f(string) is ready to run in each instance. I hope I have put this clearly. I’d much appreciate any suggestions. Thanks in advance, James Spottiswoode -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.