Re: [R] COVID-19 datasets...

2020-05-05 Thread James Spottiswoode
Sure. COVID-19 Data Repository by the Center for Systems Science and 
Engineering (CSSE) at Johns Hopkins University is available here:

https://github.com/CSSEGISandData/COVID-19

All in csv fiormat.


> On May 4, 2020, at 11:31 AM, Bernard McGarvey  
> wrote:
> 
> Just curious does anyone know of a website that has data available in a 
> format that R can download and analyze?
>  
> Thanks
> 
> 
> Bernard McGarvey
> 
> 
> Director, Fort Myers Beach Lions Foundation, Inc.
> 
> 
> Retired (Lilly Engineering Fellow).
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

James Spottiswoode
Applied Mathematics & Statistics
(310) 270 6220
jamesspottiswoode Skype
ja...@jsasoc.com




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to parallelize a process called by a socket connection

2020-02-01 Thread James Spottiswoode
Hi R Experts,

I’m using R version 3.4.3 running under Linux on an AWS EC2 instance.  I have 
an R code listening on a port for a socket connection which passes incoming 
data to a function the results of which are then passed back to the calling 
machine.  Here’s the function that listens for a socket connection:

# define server function
server <- function() {  
  while(TRUE){
con <- socketConnection(host="localhost", port = server_port, 
blocking=TRUE,
server=TRUE, open="r+", timeout = 1)
data <- readLines(con, 1L, skipNul = T, ok = T)
response <- check(data)
if (!is.null(response)) writeLines(response, con)
  }
}

The server function expects to receive a character string which is then passed 
to the function check().  check() is a large, complex routine which does text 
analysis and many other things and returns a JSON string to be passed back to 
the calling machine.  

This all works perfectly except that while check() spends ~50ms doing its stuff 
no more requests can be received and processed. Therefore if a new request 
comes in sooner than ~50ms after the last one, it is not processed. I would 
therefore like to parallelize this so that the box can be running more than one 
check() process simulatanously.  I’m familar with several of the paralyzing R 
packages but I cannot see how to integrate them with the socket connection side 
of things.  

Currently I have a kludge which is a round-robin approach to solving the 
problem.  I have 4 versions of the whole R code listening on 4 different ports, 
say P1, P2, P3, P4, and the calling machine issues calls in sequence to ports 
P1,P2,P3,P4,P1… etc. This mitigates, but doesn’t solve, the problem.

Any advice would be greatly appreciated!  Thanks.

James 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] File names for mac newby

2020-01-21 Thread James Spottiswoode
OSX is based on BSD UNIX so paths use the forward slash as separator, e.g.

temps <- 
read.table("c:/Users/DFP/Documents/ah/house/HouseTemps.txt",header=T,row.names=1)

Best James

> On Jan 21, 2020, at 9:20 AM, David  wrote:
> 
> I moved to a mac a few months ago after years in windows, and I'm still 
> learning basics.  I'm wanting to create a data frame based on a text file 
> called HouseTemps.txt.  That's a file within one called house which is within 
> one called ah.  That may further be in one called  Documents.  I tried 
> various lines like:
> 
> temps <- 
> read.table("c:\\Users\\DFP\\Documents\\ah\\house\\HouseTemps.txt",header=T,row.names=1)
> 
> based on my windows DOS experience, but nothing I try works.  So my question 
> is, what do complete file names look like in a mac?
> 
> I tried Apple support, but they couldn't help me with R.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parameters optimization in r

2019-10-12 Thread James Spottiswoode
Hi,

I’ve often come across this problem and have found genetic algorithms (GA) to 
be extremely useful. I wrote my first GA code in the 80’s and have extensive 
experience with the method. The package rgenoud is a very full featured  GA 
implementation.  Just code up your parameters as arguments to the function 
giving your method, random forests or whatever, then define a target variable 
for performance or fitness such as AUC or R^2, whatever is appropriate, and let 
the GA climb to the top of the fitness landscape.  If you have a large problem 
you may want to speed things up by using parallel processes across cores or 
machines.  Rgenoud handles that well.

Good luck!

James


> On Oct 11, 2019, at 4:21 PM, javed khan  wrote:
> 
> Hi
> 
> I will appreciate if someone provide the link to some tutorials/videos
> where parameters running are performed in R. For instance, if we have to
> perform predictions/classification using random forest or other algorithm,
> how different optimization algorithms tune the parameters of random forest
> such as numbers of trees etc.
> 
> Best regards
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem parallelizing across cores

2019-08-29 Thread James Spottiswoode



> On Aug 28, 2019, at 4:44 PM, James Spottiswoode  wrote:
> 
> Hi Bert,
> 
> Thanks for your advice.  Actually i’ve already done this and have checked out 
> doParallel and future packages.  The trouble with doParallel is that it forks 
> R processes which spend a lot of time loading data and packages whereas my 
> function runs in 100ms so the parallelization doesn’t help.  The future 
> package keeps it’s children running but I haven’t figured out how to get it 
> to work in my application.
> 
> Best — James
> 
> 
>> On Aug 28, 2019, at 3:39 PM, Bert Gunter > <mailto:bgunter.4...@gmail.com>> wrote:
>> 
>> 
>> I would suggest that that you search on "parallel computing" at the 
>> Rseek.org <http://rseek.org/> site. This brought up what seemed to be many 
>> relevant hits including, of course, the High Performance and parallel 
>> Computing Cran task view.
>> 
>> Cheers,
>> Bert
>> 
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming along and 
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>> On Wed, Aug 28, 2019 at 3:18 PM James Spottiswoode 
>> mailto:james.spottiswo...@gmail.com>> wrote:
>> Hi All,
>> 
>> I have a piece of well optimized R code for doing text analysis running
>> under Linux on an AWS instance.  The code first loads a number of packages
>> and some needed data and the actual analysis is done by a function called,
>> say, f(string).  I would like to parallelize calling this function across
>> the 8 cores of the instance to increase throughput.  I have looked at the
>> packages doParallel and future but am not clear how to do this.  Any method
>> that brings up an R instance when the function is called will not work for
>> me as the time to load the packages and data is comparable to the execution
>> time of the function leading to no speed up.  Therefore I need to keep a
>> number of instances of the R code running continuously so that the data
>> loading only occurs once when the R processes are first started and
>> thereafter the function f(string) is ready to run in each instance.  I hope
>> I have put this clearly.
>> 
>> I’d much appreciate any suggestions.  Thanks in advance,
>> 
>> James Spottiswoode
>> 
>> 
>> --
>> 
>> [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To 
>> UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help 
>> <https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
>> <http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
> 
> James Spottiswoode
> Applied Mathematics & Statistics
> (310) 270 6220
> jamesspottiswoode Skype
> ja...@jsasoc.com <mailto:ja...@jsasoc.com>

James Spottiswoode
Applied Mathematics & Statistics
(310) 270 6220
jamesspottiswoode Skype
ja...@jsasoc.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem parallelizing across cores

2019-08-28 Thread James Spottiswoode
Hi All,

I have a piece of well optimized R code for doing text analysis running
under Linux on an AWS instance.  The code first loads a number of packages
and some needed data and the actual analysis is done by a function called,
say, f(string).  I would like to parallelize calling this function across
the 8 cores of the instance to increase throughput.  I have looked at the
packages doParallel and future but am not clear how to do this.  Any method
that brings up an R instance when the function is called will not work for
me as the time to load the packages and data is comparable to the execution
time of the function leading to no speed up.  Therefore I need to keep a
number of instances of the R code running continuously so that the data
loading only occurs once when the R processes are first started and
thereafter the function f(string) is ready to run in each instance.  I hope
I have put this clearly.

I’d much appreciate any suggestions.  Thanks in advance,

James Spottiswoode


--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.