Re: [R] [External] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?
library(dplyr) my_df |> group_by(my_category) |> summarise(my_z = cor(my_x, my_y)) On Sat, Apr 9, 2022 at 4:37 AM Richard M. Heiberger wrote: > look at > ?mapply > Apply a Function to Multiple List or Vector Arguments > > to see if that meets your needs > > > On Apr 08, 2022, at 21:26, Kelly Thompson wrote: > > > > #Q. How can I "apply" a function that takes two or more vectors as > > arguments, such as cor(x, y), over a "category" or "grouping variable" > > or "index"? > > #I'm using cor() as an example, I'd like to find a way to do this for > > any function that takes 2 or more vectors as arguments. > > > > > > #create example data > > > > my_category <- rep ( c("a","b","c"), 4) > > > > set.seed(12345) > > my_x <- rnorm(12) > > > > set.seed(54321) > > my_y <- rnorm(12) > > > > my_df <- data.frame(my_category, my_x, my_y) > > > > #review data > > my_df > > > > #If i wanted to get the correlation of x and y grouped by category, I > > could use this code and loop: > > > > my_category_unique <- unique(my_category) > > > > my_results <- vector("list", length(my_category_unique) ) > > names(my_results) <- my_category_unique > > > > #start i loop > > for (i in 1:length(my_category_unique) ) { > >my_criteria_i <- my_category == my_category_unique[i] > >my_x_i <- my_x[which(my_criteria_i)] > >my_y_i <- my_y[which(my_criteria_i)] > >my_correl_i <- cor(x = my_x_i, y = my_y_i) > >my_results[i] <- list(my_correl_i) > > } # end i loop > > > > #review results > > my_results > > > > #Q. Is there a better or more "elegant" way to do this, using by(), > > aggregate(), apply(), or some other function? > > > > #This does not work and results in this error message: "Error in > > FUN(dd[x, ], ...) : incompatible dimensions" > > by (data = my_x, INDICES = my_category, FUN = cor, y = my_y) > > > > #This does not work and results in this error message: "Error in > > cor(my_df$x, my_df$y) : ... supply both 'x' and 'y' or a matrix-like > > 'x' " > > by (data = my_df, INDICES = my_category, FUN = function(x, y) { cor > > (my_df$x, my_df$y) } ) > > > > > > #if I wanted the mean of x by category, I could use by() or aggregate(): > > by (data = my_x, INDICES = my_category, FUN = mean) > > > > aggregate(x = my_x, by = list(my_category), FUN = mean) > > > > #Thanks! > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=04%7C01%7Crmh%40temple.edu%7C4c8a50fd1bf14b2cf7b408da19c7fe20%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637850644148770767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=23Y%2Fqw7G1gb4ACIz5V41DjBIR8c2IFkkZgud9dGaftE%3Dreserved=0 > > PLEASE do read the posting guide > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=04%7C01%7Crmh%40temple.edu%7C4c8a50fd1bf14b2cf7b408da19c7fe20%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637850644148770767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=3vIZYrMBnAZKZhZCwHcLpILHEE72NuLc03LXAxr%2BXQ4%3Dreserved=0 > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with more 100 forked processes
The reason why you hit the limit already around 100 workers, could be because you already have other connections open, e.g. file connections, capture.output(), etc. If you want to use *forked* processing with more than 125 workers using bare-bone R, you can use parallel::mclapply() and friends, because they don't use sockets connections to communicate between the main process and the workers. If you don't need *forked* processing per se, there are other alternatives, as already pointed out above. As the author of the future framework (https://www.futureverse.org/), I obviously suggest you try that one. It's on CRAN and installs out of the box on all OSes. You get several alternatives for parallel backends. For *forked* processing, call plan(multicore) on top of your script, and it'll parallelize via the parallel::mclapply() framework internally, so you won't have the connection limitation to worry about(*). You can also use plan(future.callr::callr) to parallelize via the callr package, which also don't have the connection limitation. Your code will be the same regardless which you end up using. For the front end, there's future.apply::future_lapply() et al. (parallel version of base lapply functions), furrr::future_map() et al. (parallel version of purrr's map functions), foreach w/ doFuture if you like the y <- foreach(...) %dopar% { ... } style. (*) But there are other issues with forked processing, e.g. it might not be compatible with multi-threaded code used by some packages. This is a problem independent of futures per se. Hope this helps Henrik On Fri, Apr 8, 2022 at 2:19 PM Ivan Krylov wrote: > > On Fri, 8 Apr 2022 22:02:25 +0200 > Guido Kraemer via R-help wrote: > > > > cl <- makeForkCluster(128) > > Error in UseMethod("sendData") : > >no applicable method for 'sendData' applied to an object of class > > "NULL" > > In order to communicate with the workers, R creates connection objects. > Unfortunately, the memory for connection objects in R has a > statically-defined limit of 128. (A few connections are used by > default, and a few more will likely be used by user code during the > actual program run.) > > Try increasing the limit in #define NCONNECTIONS in > src/main/connections.c and re-compiling R. > > See also: https://github.com/HenrikBengtsson/Wishlist-for-R/issues/28 > According to Henrik Bengtsson, R should work well even with as many > as 16381 possible connections, but then you may run into OS limits on > file descriptors. > > > -- > Best regards, > Ivan > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error with more 100 forked processes
I am trying to run a parallel job on a computer with many CPUs and get the following error: > library(parallel) > cl <- makeForkCluster(128) Error in UseMethod("sendData") : no applicable method for 'sendData' applied to an object of class "NULL" If I scale down to 100 CPUs it doesn't produce an error. I can reproduce this with a self compiled R 4.1.3 on Ubuntu 20.04 and Manjaro, as well as the R binaries that come with both distributions. -- Guido Kraemer Max Planck Institute for Biogeochemistry Jena Department for Biogeochemical Integration Hans-Knöll-Str. 10 07745 Jena Germany phone: +49 3641 576293 e-mail: gkrae...@bgc-jena.mpg.de __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [ESS] [External] Emacs 28.1 Released
Hi, Vincent, you might want to connect with David Caldwell on the issues that you are facing. I noted that on this page: https://emacsformacosx.com/about he notes that there are changes to the app launcher that may be relevant to what you are experiencing. Whether there is an intentional change in behavior, or possible bug, may be something to evaluate. If people find value in your distributions, given ease of use considerations, it seems to be worthwhile to continue to support them. For Rich, note that since last month, with the Emacs 2.7.2 release, the macOS Emacs binaries from David Caldwell are universal, supporting Apple silicon. More generally, for some time, even with Emacs 26.x, I had been using the melpa based ESS code, rather than the 18.10.2 release code, along with some tweaks in my .emacs to try to workaround some of the ESS issues that became evident with Emacs 27.x. ESS 18.10.2 is now over 3 years old, which is circa Emacs 26.1. Regards, Marc On April 8, 2022 at 11:44:50 AM, Richard M. Heiberger via ESS-help (ess-help@r-project.org (mailto:ess-help@r-project.org)) wrote: > Thank you Vincent for your distribution. Please keep it up. > > I used to do my own setup until xxx time ago and have used yours (both > Windows and Mac) since. > I override your some of your personalizations with my own preferences and it > works. > > When I switched my Mac to the M1 last year I had some difficulties with the > Rosetta emulation of Emacs itself, > so Simon Urbanek was kind enough to give me a native compilation of Emacs for > the M1. > I substituted that into what was otherwise your distribution and have been > quite happy. > > For the future, I am very happy not to deal directly with the issues you > mention in this email. > > Rich > > > > On Apr 08, 2022, at 11:11, Vincent Goulet via ESS-help wrote: > > > > Hi, > > > > Thanks Marc (and Richard L through GitLab) for the heads up. > > > > I tried building my Emacs distribution (on macOS) and stumbled on a weird > > problem: the 'site-lisp' directory within the application (e.g. > > /Applications/Emacs/Emacs.app/Contents/Resources/site-lisp) is not included > > in 'load-path' by default. Since this is where I bundle extensions, they > > are not recognized by Emacs. Perhaps the issue is upstream with David > > Caldwell's compilation; I'll have to check. I haven't yet taken the time to > > check on Windows. > > > > That said, ESS 18.10.2 does not compile with Emacs 28.1. It appears it is > > time to move forward to the development version of ESS. Those, like me, who > > prefer the good ol' stable ESS 18.10 are otherwise stuck on Emacs 27.x. ;-) > > > > Over the past few years, Emacs has moved consistently towards the ELPA > > package management system. Pretty much anyone able to use Emacs should now > > be able to install extensions easily. Org has deprecated the .zip > > distribution. Same for ESS de facto, at least currently. This leads me to > > question whether maintaining my distribution remains that much useful. Any > > thoughts? > > > > (For anyone not familiar, my Emacs distributions for macOS and Windows are > > stock GNU Emacs with ESS, AUCTeX, Org and some very minor configuration; > > see > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fvigou3.gitlab.io%2Femacs-modified-macos=04%7C01%7Crmh%40temple.edu%7C718cb9b549244b1b20d008da197217ac%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637850275076056275%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=x2oucqjMgwCXDoZYZ694p%2F1WqIpY6g9nXmwr%2FrZRNB0%3D=0; > > > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fvigou3.gitlab.io%2Femacs-modified-windows=04%7C01%7Crmh%40temple.edu%7C718cb9b549244b1b20d008da197217ac%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637850275076056275%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=vxCP4GVkm%2BiQVxEtx5RnJ7vGGzI%2BVQYa2oQNU8pzIvI%3D=0.) > > > > Best, > > > > v. > > > > Vincent Goulet > > Professeur titulaire > > École d'actuariat, Université Laval __ ESS-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/ess-help
Re: [R] [External] add equation and rsqared to plot
Thanks, Bill. This is a subtlety I certainly did not understand. Bert On Fri, Apr 8, 2022 at 10:08 AM Bill Dunlap wrote: > > plotmath also accepts names and calls, which it treats as though they were > single-element expressions. That is why quote() generally works. > quote("string") or quote(123) does not invoke plotmath, as quote returns a > literal string or number when given such a thing. > > plot(0:1,0:1,type="n") > text(.2, .6, expression(phi^epsilon)) > text(.2, .4, quote(phi^epsilon)) > text(.7, .6, expression(1234567890123456)) > text(.7, .4, quote(1234567890123456)) > > -Bill > > On Fri, Apr 8, 2022 at 9:49 AM Bert Gunter wrote: >> >> Yes, I also find it somewhat confusing. Perhaps this will help. I >> apologize beforehand if I have misunderstood and you already know all >> this. >> >> The key is to realize that plotmath works with **expressions**, >> unevaluated forms that include special plotmath keywords, like 'atop', >> and symbols. So... >> >> ## simple example with plotmath used in plot's title >> >> ## This will produce an error, as 'atop' is not an R function: >> plot(1,1, main = atop(x,y)) >> >> ## to make this work, we need an expression on the rhs of 'main =' . A >> simple way to do this is to use quote(): >> >> plot(1,1,main = quote(atop(x,y))) >> >> ## Note that this produce 'x' above 'y' **without quoting x and y**. >> That's because >> ## this is an expression that plotmath parses and evaluates according >> to its own rules, >> ## shown in ?plotmath >> >> ## Now suppose we have: >> x <- 'first line' >> y <- 'second line' >> >> ## and we want to display these quoted strings instead of 'x' and 'y' >> in the title >> >> ## Then this will *not* work -- it gives the same result as before: >> plot(1,1,main = quote(atop(x,y))) >> >> ## So what is needed here is R's 'computing on the language" >> capability to substitute >> ## the quoted strings for x and y in the expression. Here are two >> simple ways to do this: >> >> ## First using substitute() >> >> plot(1,1, main = substitute(atop(x,y), list (x =x, y = y))) >> >> ## Second, using bquote() >> >> plot(1,1, main = bquote(atop(.(x), .(y >> >> ## More complicated expressions can be built up using plotmath's rules. >> ## But you need to be careful about distinguishing plotmath expressions and >> ## ordinary R expressions. For example: >> >> x <- pi/4 ## a number >> >> ## WRONG -- will display as written. bquote() is the same as quote() here. >> plot(1,1, main = bquote(sin(pi/4) == round(x,2))) >> >> ## WRONG -- will substitute value of x rounded to session default >> ## in previous. This is a mistake in using bquote >> plot(1,1, main = bquote(sin(pi/4) == round(.(x), 2))) >> >> ## RIGHT -- use of bquote >> plot(1,1, main = bquote(sin(pi/4) == .(round(x,2 >> ## or -- using substitute >> plot(1,1, main = substitute(sin(pi/4) == x, list(x = round(x,2 >> >> Hope this is helpful and, again, apologies if I have misunderstood. >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> On Fri, Apr 8, 2022 at 7:42 AM PIKAL Petr wrote: >> > >> > Hallo David >> > >> > Fair enough. Thanks for your explanation, which told me what should be >> > done. It works perfectly for my example but I am still confused how to get >> > expressions given to atop (or other functions) be evaluated and help page >> > does not enlighten me, so I am still puzzled. >> > >> > When I borrow example from help, >> > >> > plot(1:10, type="n", xlab="", ylab="", main = "plot math & numbers") >> > theta <- 1.23 ; mtext(bquote(hat(theta) == .(theta)), line= .25) >> > for(i in 2:9) >> > text(i, i+1, substitute(list(xi, eta) == group("(",list(x,y),")"), >> > list(x = i, y = i+1))) >> > >> > #this is OK >> > ex1 <- expression(" first: {f * minute}(x) " == {f * minute}(x)) >> > ex2 <- expression(" second: {f * second}(x) "== {f * >> > second}(x)) >> > text(1, 9.6, ex1, adj=0) >> > text(1, 9.0, ex2, adj=0) >> > >> > #and this is not >> > text(2, 8, expression(atop(ex1, ex2))) >> > text(2, 7, substitute( atop(ex1, ex2), list(ex1=ex1,ex2=ex2))) >> > >> > #and this works >> > text(2, 6, expression(atop(1,2))) >> > >> > I tried to use eval when calling atop, but it did not work either. >> > Therefore some hint in help page could be quite handy. >> > >> > Best regards >> > Petr Pikal >> > >> > S pozdravem | Best Regards >> > RNDr. Petr PIKAL >> > Vedoucí Výzkumu a vývoje | Research Manager >> > PRECHEZA a.s. >> > nábř. Dr. Edvarda Beneše 1170/24 | 750 02 Přerov | Czech Republic >> > Tel: +420 581 252 256 | GSM: +420 724 008 364 >> > petr.pi...@precheza.cz | www.precheza.cz >> > >> > Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních >> > partnerů PRECHEZA a.s. jsou zveřejněny na: >> >
Re: [R] pam() with more general dissimilarity / distance
I was asked in private, but reply in public, so others can also find this answer in the future: On Fri, Apr 8, 2022 at 1:11 PM . wrote : > Hello > dear Dr. Maechler > I have a question about "pam" function in the cluster package. In this > function, we choose one of the euclidean or manhattan distances to > calculate dissimilarity but in the mixed typed data sets the true index may > be jaccard or other indicators. > How can we allocate the "true" metric for each variable? > Best regards > yes, you can use pam() use in two ways; see this part of the help page : Arguments: x: data matrix or data frame, or dissimilarity matrix or object, depending on the value of the ‘diss’ argument. In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) _are_ allowed-as long as every pair of observations has at least one case not missing. In case of a dissimilarity matrix, ‘x’ is typically the output of daisy or dist. Also a vector of length n*(n-1)/2 is allowed (where n is the number of observations), and will be interpreted in the same way as the output of the above-mentioned functions. Missing values (NAs) are _not_ allowed. So, you can first use dx <- daisy(x, ...) and use the correct distance between your observational units, After that you can use the computed distance / dissimilarity matrix (the `dx`) in you call to pam(): px <- pam(dx, k=., ) I hope this helps you. With best regards, Martin -- Martin Maechler ETH Zurich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.