Hi Steve -- steve_fried...@nps.gov wrote: > Hello > > I am fortunate (or in really big trouble) in that the research group I work > with will soon be receiving several high end dual quad core machines. We > will use the Ubuntu OS on these. We intend to use this cluster for some > extensive modeling applications. Our programming guru has demonstrated the > ability to link much simpler machines to share CPUs and we purchased the > new ones to take advantage of this option. We have also begun exploration > of the R CUDA and J CUDA functionality to push the processes to the > graphics CPU which greatly speeds up the numerical processing. > > My question(s) to this group:
Last question first, the R-sig-hpc group might be more appropriate for an extended discussion. https://stat.ethz.ch/mailman/listinfo/r-sig-hpc see also the HighPerformanceComputing task view http://cran.fhcrc.org/web/views/HighPerformanceComputing.html > 1) Which packages are suitable for parallel processing applications in R > ? > 2) Are these packages ready for prime time applications or are they > developmental at this time? I use Rmpi for all my parallel computing, but if I had more time I'd explore multicore for more efficient use of several CPU on a single machine, and the new offerings from Revolution computing. If there were significant portions of C code I'd look into using openMP (as done in the pnmath library). Also using a parallel BLAS / LAPACK library if that was where significant computation was occurring. > 3) Are we better off working in Java or C++ for the majority of this > simulation work and linking to R for statistical analysis? > 4) What are the pit falls, if any, that I need to be aware of ? With multiple core, it's important to remember that large memory is divided amongst cpu, so that huge-sounding 32GB 8 core machine has 'only' 4 GB / cpu when independent R processes are allocated to each cpu (as is the style with Rmpi). > 5) Can we take advantage of sharing the graphics CPU, via R CUDA, in a > parallel distributed shared cluster of dedicated machines ? > > 6) Our statistical analysis and modeling applications address very large > geographic issues. We generally work with 30-40 year daily time step data > in a grided format. The grid is approximate 250 x 400 cells in extent, each > representing approximately 500 meters x 500 meters. To this we a very > large suite of ancillary information, both spatial and non-spatial, to > simulate a variety of ecological state conditions. My question is - is > this too large for R , given its use of memory? Depending on the application, large data sets can often be managed effectively on disk, e.g., by using the ncdf package (for large numeric data) or a data base (R includes sqlite, for instance), and analyzing independent 'slices'. This fits well with common parallel computing paradigms. > > 7) I currently have a laptop with Ubuntu with R Version 2.6.2 > (2008-02-08). What is the most recent R version for Ubuntu and what is the > installation procedure ? > > These are just the initial questions that I'm sure to have. If these are > being directed to the wrong help pages, I'm sorry to have taken your time. > If you would be so kind as to direct me to the more appropriate help site > I'd appreciate your assistance. > > Thanks in advance, > Steve > > > Steve Friedman Ph. D. > Spatial Statistical Analyst > Everglades and Dry Tortugas National Park > 950 N Krome Ave (3rd Floor) > Homestead, Florida 33034 > > steve_fried...@nps.gov > Office (305) 224 - 4282 > Fax (305) 224 - 4147 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.