As parameters. For example, if you have 100 simulations, set up a list of 4 distinct sets of data (1:25, 26:50, etc) and call the single-threaded processing function from parLapply iterated four times. Then each instance of the processing function won't return until it has completed 25 simulations. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity.
Jeffrey Flint <jeffrey.fl...@gmail.com> wrote: >How can I copy distinct blocks of data to each process? > >On Mon, Oct 14, 2013 at 10:21 PM, Jeff Newmiller ><jdnew...@dcn.davis.ca.us> wrote: >> The session info is helpful. To the best of my knowledge there is no >easy way to share memory between R processes other than forking. You >can use clusterExport to make "global" copies of large data structures >in each process and pass index values to your function to reduce copy >costs at a price of extra data copies in each process that won't be >used. Or you can copy distinct blocks of data to each process and use >single threaded processing to loop over the blocks within the workers >to reduce the number of calls to workers. However I don't claim to be >an expert with the parallel package, so others may have better advice. >However, with two cores I don't usually get better than a 30% >speedup... the best payoff comes with four or more workers working. >> >--------------------------------------------------------------------------- >> Jeff Newmiller The ..... ..... Go >Live... >> DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live >Go... >> Live: OO#.. Dead: OO#.. >Playing >> Research Engineer (Solar/Batteries O.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. >rocks...1k >> >--------------------------------------------------------------------------- >> Sent from my phone. Please excuse my brevity. >> >> Jeffrey Flint <jeffrey.fl...@gmail.com> wrote: >>>Jeff: >>> >>>Thank you for your response. Please let me know how I can >>>"unhandicap" my question. I tried my best to be concise. Maybe this >>>will help: >>> >>>> version >>> _ >>>platform i386-w64-mingw32 >>>arch i386 >>>os mingw32 >>>system i386, mingw32 >>>status >>>major 3 >>>minor 0.2 >>>year 2013 >>>month 09 >>>day 25 >>>svn rev 63987 >>>language R >>>version.string R version 3.0.2 (2013-09-25) >>>nickname Frisbee Sailing >>> >>> >>>I understand your comment about forking. You are right that forking >>>is not available on windows. >>> >>>What I am curious about is whether or not I can direct the execution >>>of the parallel package's functions to diminish the overhead. My >>>guess is that there is overhead in copying the function to be >executed >>>at each iteration and there is overhead in copying the data to be >used >>>at each iteration. Are there any paradigms in the package parallel >to >>>reduce these overheads? For instance, I could use clusterExport to >>>establish the function to be called. But I don't know if there is a >>>technique whereby I could point to the data to be used by each CPU so >>>as to prevent a copy. >>> >>>Jeff >>> >>> >>> >>>On Mon, Oct 14, 2013 at 2:35 PM, Jeff Newmiller >>><jdnew...@dcn.davis.ca.us> wrote: >>>> Your question misses on several points in the Posting Guide so any >>>answers are handicapped by you. >>>> >>>> There is an overhead in using parallel processing, and the value of >>>two cores is marginal at best. In general parallel by forking is more >>>efficient than parallel by SNOW, but the former is not available on >all >>>operating systems. This is discussed in the vignette for the parallel >>>package. >>>> >>>--------------------------------------------------------------------------- >>>> Jeff Newmiller The ..... ..... Go >>>Live... >>>> DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. >Live >>>Go... >>>> Live: OO#.. Dead: OO#.. >>>Playing >>>> Research Engineer (Solar/Batteries O.O#. #.O#. >with >>>> /Software/Embedded Controllers) .OO#. .OO#. >>>rocks...1k >>>> >>>--------------------------------------------------------------------------- >>>> Sent from my phone. Please excuse my brevity. >>>> >>>> Jeffrey Flint <jeffrey.fl...@gmail.com> wrote: >>>>>I'm running package parallel in R-3.0.2. >>>>> >>>>>Below are the execution times using system.time for when executing >>>>>serially versus in parallel (with 2 cores) using parRapply. >>>>> >>>>> >>>>>Serially: >>>>> user system elapsed >>>>> 4.67 0.03 4.71 >>>>> >>>>> >>>>> >>>>>Using package parallel: >>>>> user system elapsed >>>>> 3.82 0.12 6.50 >>>>> >>>>> >>>>> >>>>>There is evident improvement in the user cpu time, but a big jump >in >>>>>the elapsed time. >>>>> >>>>>In my code, I am executing a function on a 1000 row matrix 100 >times, >>>>>with the data different each time of course. >>>>> >>>>>The initial call to makeCluster cost 1.25 seconds in elapsed time. >>>>>I'm not concerned about the makeCluster time since that is a fixed >>>>>cost. I am concerned about the additional 1.43 seconds in elapsed >>>>>time (6.50=1.43+1.25). >>>>> >>>>>I am wondering if there is a way to structure the code to avoid >>>>>largely avoid the 1.43 second overhead. For instance, perhaps I >>>could >>>>>upload the function to both cores manually in order to avoid the >>>>>function being uploaded at each of the 100 iterations? Also, I >am >>>>>wondering if there is a way to avoid any copying that is occurring >at >>>>>each of the 100 iterations? >>>>> >>>>> >>>>>Thank you. >>>>> >>>>>Jeff Flint >>>>> >>>>>______________________________________________ >>>>>R-help@r-project.org mailing list >>>>>https://stat.ethz.ch/mailman/listinfo/r-help >>>>>PLEASE do read the posting guide >>>>>http://www.R-project.org/posting-guide.html >>>>>and provide commented, minimal, self-contained, reproducible code. >>>> >> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.