In most threaded multitasking environments it is not safe to perform IO in 
multiple threads. In general you will have difficulty performing IO in parallel 
processing so it is best to let the master hand out data to worker tasks and 
gather results from them for storage. Keep in mind that just because you have 
eight cores for processing doesn't mean you have eight hard disks, so if your 
problem is IO bound in single processor operation then it will also be IO bound 
in threaded operation.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

"Hopkins, Bill" <bill.hopk...@level3.com> wrote:
>Has there been any systematic evaluation of which core R functions are
>safe for use with multicore? Of current interest, I have tried calling
>read.table() via mclapply() to more quickly read in hundreds of raw
>data files (I have a 24 core system with 72 GB running Ubuntu, a
>perfect platform for multicore). There was a 40% failure rate, which
>doesn't occur when I invoke read.table() serially from within a single
>thread. Another example was using pvec() to invoke
>sapply(strsplit(),...) on a huge character vector (to pull out fields
>from within a field). It looked like a perfect application for pvec(),
>but it fails when serial execution works.
>
>I thought I'd ask before taking on the task of digging into the
>underlying code to see what is might be causing failure in a multicore
>(well, multi-threaded) context.
>
>As an alternative, I could define multiple cluster nodes locally, but
>that shifts the tradeoff a bit in whether parallel execution is
>advantageous - the overhead is significantly more, and even with 72 GB,
>it does impose greater limits on how many cores can be used.
>
>Bill Hopkins
>
>______________________________________________
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to