Hello R-devel, A function to be run inside lapply() or one of its friends is trivial to augment with side effects to show a progress bar. When the code is intended to be run on a 'parallel' cluster, it generally cannot rely on its own side effects to report progress.
I've found three approaches to progress bars for parallel processes on CRAN: - Importing 'snow' (not 'parallel') internals like sendCall and implementing parallel processing on top of them (doSNOW). This has the downside of having to write higher-level code from scratch using undocumented inferfaces. - Splitting the workload into length(cluster)-sized chunks and processing them in separate parLapply() calls between updating the progress bar (pbapply). This approach trades off parallelism against the precision of the progress information: the function has to wait until all chunk elements have been processed before updating the progress bar and submitting a new portion; dynamic load balancing becomes much less efficient. - Adding local side effects to the function and detecting them while the parallel function is running in a child process (parabar). A clever hack, but much harder to extend to distributed clusters. With recvData and recvOneData becoming exported in R-4.4 [*], another approach becomes feasible: wrap the cluster object (and all nodes) into another class, attach the progress callback as an attribute, and let recvData / recvOneData call it. This makes it possible to give wrapped cluster objects to unchanged code, but requires knowing the precise number of chunks that the workload will be split into. Could it be feasible to add an optional .progress argument after the ellipsis to parLapply() and its friends? We can require it to be a function accepting (done_chunk, total_chunks, ...). If not a new argument, what other interfaces could be used to get accurate progress information from staticClusterApply and dynamicClusterApply? I understand that the default parLapply() behaviour is not very amenable to progress tracking, but when running clusterMap(.scheduling = 'dynamic') spanning multiple hours if not whole days, having progress information sets the mind at ease. I would be happy to prepare code and documentation. If there is no time now, we can return to it after R-4.4 is released. -- Best regards, Ivan [*] https://bugs.r-project.org/show_bug.cgi?id=18587 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel