Has there been any systematic evaluation of which core R functions are safe for 
use with multicore? Of current interest, I have tried calling read.table() via 
mclapply() to more quickly read in hundreds of raw data files (I have a 24 core 
system with 72 GB running Ubuntu, a perfect platform for multicore). There was 
a 40% failure rate, which doesn't occur when I invoke read.table() serially 
from within a single thread. Another example was using pvec() to invoke 
sapply(strsplit(),...) on a huge character vector (to pull out fields from 
within a field). It looked like a perfect application for pvec(), but it fails 
when serial execution works.

I thought I'd ask before taking on the task of digging into the underlying code 
to see what is might be causing failure in a multicore (well, multi-threaded) 
context.

As an alternative, I could define multiple cluster nodes locally, but that 
shifts the tradeoff a bit in whether parallel execution is advantageous - the 
overhead is significantly more, and even with 72 GB, it does impose greater 
limits on how many cores can be used.

Bill Hopkins

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to