I have now added this to the wishlist in Bugzilla: 
https://bugs.r-project.org/show_bug.cgi?id=18234

2021-10-07 09:26 GMT+02:00 "Andreas Kersting" <r-de...@akersting.de>:
> Hi all,
> 
> As part of RunGenCollect() (in src/main/memory.c), some maintenance on the 
> CHARSXP cache is done, namely unmarked nodes/CHARSXPs are removed from the 
> hash chains. This requires always touching all CHARSXP in the cache, 
> irrespective of the number of generations which were just garbage collected. 
> In a session with a big CHARSXP cache, this will significantly slow down gc 
> also when just collecting the youngest generation.
> 
> However, this part of RunGenCollect() seems to be one of the few which can 
> easily be parallelized without the need for thread synchronization. And it 
> seems to be the one most profiting from parallelization.
> 
> Attached patch (parallel_CHARSXP_cache.diff) implements parallelization over 
> the elements of R_StringHash and gives the following performance improvements 
> on my system when using 4 threads compared to R devel (revision 81008):
> 
> Elapsed time for 200 non-full gc in a session after
> 
> x <- as.character(runif(1e6))[]
> gc(full = TRUE)
> 
> 8sec -> 2.5sec.
> 
> AND
> 
> Elapsed time for five non-full gc in a session after
> 
> x <- as.character(runif(5e7))[]
> gc(full = TRUE)
> 
> 21sec -> 6sec.
> 
> In the patch, I dropped the two lines 
> 
> FORWARD_NODE(s);
> FORWARD_NODE(CXHEAD(s));
> 
> because they are currently both no-ops (and would require synchronization if 
> they were not). They are no-ops because we have
> 
> # define CXHEAD(x) (x)  // in Defn.h
> 
> and hence FORWARD_NODE(s)/FORWARD_NODE(CXHEAD(s)) is only called when s is 
> already marked, in which case FORWARD_NODE() does nothing.
> 
> I used OpenMP despite the known issues of some of its implementations with 
> hanging after a fork, mostly because it was the easiest thing to do for a 
> PoC. I worked around this similar to e.g. data.table by using only one thread 
> in forked children.
> 
> It might be worth considering making the parallelization conditional on the 
> size of the CHARSXP cache and use only the main thread if the cache is 
> (still) small.
> 
> In the second attached patch (parallel_CHARSXP_cache_no_forward.diff) I 
> additionally no longer call FORWARD_NODE(R_StringHash) because this will make 
> the following call to PROCESS_NODES() iterate through all elements of 
> R_StringHash again which is unnecessary since all elements are either 
> R_NilValue or an already marked CHARSXP. I rather directly mark & snap 
> R_StringHash. In contrast to the parallelization, this only affects full gcs 
> since R_StringHash will quickly belong to the oldest generation.
> 
> Attached gc_test.R is the script I used to get the previously mentioned and 
> more gc timings.
> 
> To me this looks like a significant performance improvement, especially given 
> the little changeset. What do you think?
> 
> Best regards,
> Andreas
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to