I have now added this to the wishlist in Bugzilla: https://bugs.r-project.org/show_bug.cgi?id=18234
2021-10-07 09:26 GMT+02:00 "Andreas Kersting" <r-de...@akersting.de>: > Hi all, > > As part of RunGenCollect() (in src/main/memory.c), some maintenance on the > CHARSXP cache is done, namely unmarked nodes/CHARSXPs are removed from the > hash chains. This requires always touching all CHARSXP in the cache, > irrespective of the number of generations which were just garbage collected. > In a session with a big CHARSXP cache, this will significantly slow down gc > also when just collecting the youngest generation. > > However, this part of RunGenCollect() seems to be one of the few which can > easily be parallelized without the need for thread synchronization. And it > seems to be the one most profiting from parallelization. > > Attached patch (parallel_CHARSXP_cache.diff) implements parallelization over > the elements of R_StringHash and gives the following performance improvements > on my system when using 4 threads compared to R devel (revision 81008): > > Elapsed time for 200 non-full gc in a session after > > x <- as.character(runif(1e6))[] > gc(full = TRUE) > > 8sec -> 2.5sec. > > AND > > Elapsed time for five non-full gc in a session after > > x <- as.character(runif(5e7))[] > gc(full = TRUE) > > 21sec -> 6sec. > > In the patch, I dropped the two lines > > FORWARD_NODE(s); > FORWARD_NODE(CXHEAD(s)); > > because they are currently both no-ops (and would require synchronization if > they were not). They are no-ops because we have > > # define CXHEAD(x) (x) // in Defn.h > > and hence FORWARD_NODE(s)/FORWARD_NODE(CXHEAD(s)) is only called when s is > already marked, in which case FORWARD_NODE() does nothing. > > I used OpenMP despite the known issues of some of its implementations with > hanging after a fork, mostly because it was the easiest thing to do for a > PoC. I worked around this similar to e.g. data.table by using only one thread > in forked children. > > It might be worth considering making the parallelization conditional on the > size of the CHARSXP cache and use only the main thread if the cache is > (still) small. > > In the second attached patch (parallel_CHARSXP_cache_no_forward.diff) I > additionally no longer call FORWARD_NODE(R_StringHash) because this will make > the following call to PROCESS_NODES() iterate through all elements of > R_StringHash again which is unnecessary since all elements are either > R_NilValue or an already marked CHARSXP. I rather directly mark & snap > R_StringHash. In contrast to the parallelization, this only affects full gcs > since R_StringHash will quickly belong to the oldest generation. > > Attached gc_test.R is the script I used to get the previously mentioned and > more gc timings. > > To me this looks like a significant performance improvement, especially given > the little changeset. What do you think? > > Best regards, > Andreas > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel