Very nice discussion. Thanks, Mark.
On Thu, Jul 19, 2018 at 3:20 AM, Mark van der Loo <mark.vander...@gmail.com> wrote: > > Dear Mike, et al, > > My remarks are not necessarily related to tidyverse packages. The main point > is that there are various purposes and business cases for writing code, and > they may imply different trade-offs. Let me illustrate with some examples. I > will focus on non-standard evaluation and dependencies. > > > TL;DR version: (and this is my opinion, nobody has to agree). > > 1/Interactive use: user-level NSE ok (as in the not-a-pipe operator, dplyr > verbs), use any package you want. > 2/Applications & local packages: avoid NSE within functions, package an > application with dependencies you need, write code with maintainers in mind. > 3/Published R-packages: avoid NSE within functions, minimize dependencies to > what you cannot avoid. > > Do Read version: > > 1/ One-off data analyses or exploratory data analyses. There are cases where > you don't need to guarantee that your code will run a few years from now: > you are the only user and once your task is done, you quickly need to move > on to the next. Especially in EDA, I write a lot of code that is nice to > keep in a structured project folder but most probably: 1) I will be its only > user and 2) I will use it only for this one small project so maintenance is > not an issue. Although I'm writing code in scripts, it is very close to > interactive work on the command-line. > > In such cases I use whatever gets the job done, including dplyr, tidyr, > ggplot2, data.table, you name it. Here I basically don't care about > dependencies and if I write functions there are usually not many of them. > > > 2/ Writing applications or packages for internal use. When you write an > application you are usually committing to a longer maintenance horizon and > more than one user. Good chance that you're not the user and also good > chance you're not the only developer. There are many implications to this > but since you need to maintain things for a longer term, dependencies can > become a liability. Fortunately, there are techniques to contain > dependencies, for example using packrat or by manually setting up a library > containing the packages your application depends on. You can even use a > docker instance. I have worked with custom libraries on several occasions. > Since you (or someone else) is going to maintain the application, it is > worth while to sit down and think what is the best way to set up code so it > remains maintainable. This includes questions like: can I easily understand > what happens when reading it? What expertise does the maintainer need to > understand it? Non-standard evaluation is generally much harder to reason > about than standard evaluated code. This makes debugging and extending code > harder in general. > > Now some people will argue that something like filter(data, x>1) is easier > to understand than data[data$x > 1,,drop=FALSE]. I agree that on a very > shallow level, filter(data, x>1) is easy to follow, in the sense of "oh the > author probably wants to filter something here". But when you are debugging, > you need to understand in much greater detail what happens: you need to know > that 'x>1' is an expression, that will be evaluated in the context of > 'data'. You need to know about environments and parent environments and so > on. All this knowledge can be avoided with data[data$x > 1,,drop=FALSE]. The > latter also requires knowledge, but the concepts are much simple I think. > > Hence, I tend to avoid NSE when writing applications, although there may > still be good reasons to do it. Dependencies can be containered in various > ways so they are not such a big problem. > > 3/ Writing packages for CRAN. Now you are committing to long-term > maintenance, and usage by interactive users, application builders, and > possibly other package builders. Now a dependency becomes a direct liability > in the sense that the author of your dependency can change interfaces and > ask you to comply to the new version. Also, and especially because of > recursive dependencies, importing a package may give you a whole tail of > dependencies. This increases load time but also install-time, especially on > systems where you need to install from source. Light-weight packages > therefore have real advantages in applications that run many times (like a > standalone script that is fired by users of a web-application or scripts > that are scheduled to run in high frequency). It is also worth mentioning > that an Imports or Depends puts a burden on the maintainer of the package > you depend on: before submitting to CRAN, a pkg developer needs to check > against all reverse dependencies (preferably recursively). > > So now, it is even more worth while to sit down and think about what is the > best way to set up your code. Well thought out code can be a pleasure to > maintain. Code that is hastily put together is a nightmare. > > My philosophy is as follows: I depend other packages only when they offer > something that I cannot fairly trivially do myself. This may have to do with > a statistical or numerical method I do not want or cannot implement, or it > can have something to do with performance for example. This does indeed > exclude much of the tidyverse almost automatically. Many tools in tidyverse > make already existing functionality easier for (interactive) use. But since > much of the functionality is already present in base R, and because I find > NSE hard to reason about in a programming context I have until now not used > any tidyverse packages as an Imports or Depends. > > > Hope this helps, > Best, > Mark > > > > > > > > > > > > > > > Op di 17 jul. 2018 om 23:10 schreef Michael Hannon > <jmhannon.ucda...@gmail.com>: >> >> Thanks, Mark. Your points are well-taken, but I wouldn't refer to >> this as a "small side-track". You don't say so, but this could be >> interpreted as a recommendation to avoid some or all of the >> "tidyverse" in developing packages. I'm actually quite comfortable >> doing the base-R-style programming you recommend. I've lately being >> trying to make a point of using the "tidy" stuff, as that's what I'm >> seeing almost exclusively from folks in my neighborhood these days. >> ("Resistance is few-tile...") >> >> Also, it would seem to be a corollary that if the ultimate goal is to >> make a package, then one shouldn't be using the convenience stuff >> (pipes, dplyr, etc., etc.), even during the development stages. Can >> you comment? Thanks. >> >> -- Mike >> >> >> On Tue, Jul 17, 2018 at 2:53 AM, Mark van der Loo >> <mark.vander...@gmail.com> wrote: >> > Michael, >> > >> > Just a small side-track here. I would avoid using the not-a-pipe >> > operator >> > within functions or packages in general. It is great for interactive >> > use, >> > but it does make debugging and hence long-term maintenance of functions >> > harder. There are two reasons for this. First, it hides intermediate >> > results, and second, it adds several layers to the call stack making the >> > output of functions like traceback() harder to interpret. I have >> > documented >> > a simple example here: https://github.com/chriscardillo/norris/issues/1 >> > (scroll down a bit). >> > >> > Regarding learning about quosures and so on. If the literal names of >> > data >> > frames are known, you could consider replacing >> > >> > some_var <- next_data_frame %>% dplyr::select(-amount,... >> > >> > with something simpler like >> > >> > some_var <- next_data_frame[ names(next_data_frame) != c("amount", ... ) >> > ] >> > >> > which might also save you some dependencies. >> > >> > >> > >> > >> > Hope this helps, >> > Best, >> > Mark >> > >> > >> > >> > Op di 17 jul. 2018 om 11:28 schreef Michael Hannon >> > <jmhannon.ucda...@gmail.com>: >> >> >> >> Thanks to John and Zhian for their recent and informative comments. >> >> >> >> Regarding check() and NSE: the moral seems to be that a little >> >> learning is a dangerous thing. I'm off to try to bring quosure to >> >> this issue. >> >> >> >> -- Mike >> >> >> >> >> >> On Mon, Jul 16, 2018 at 2:38 PM, Zhian Kamvar <zkam...@gmail.com> >> >> wrote: >> >> > Using dplyr like that is for exploratory data analysis. You'll want >> >> > to >> >> > refer >> >> > to dplyr's "Programming with dplyr" vignette for using dplyr in a >> >> > package: >> >> > >> >> > >> >> > https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html >> >> > >> >> > Hope that helps. >> >> > >> >> > On Jul 16, 2018, at 22:13 , Michael Hannon >> >> > <jmhannon.ucda...@gmail.com> >> >> > wrote: >> >> > >> >> > Thanks, Georgi. I've changed my approach and now do what I gather is >> >> > recommended practice: put all external package names into the >> >> > "Imports" section of the DESCRIPTION file and then use the >> >> > fully-qualified names for functions from those packages, as: >> >> > >> >> > dplyr::select() >> >> > >> >> > The "check" operation is still not entirely "happy" with me, but it >> >> > doesn't flag any errors, and the package builds and runs. >> >> > >> >> > BTW, one source of "complaints" from "check()" is evidently the use >> >> > of >> >> > NSE in the tidyverse functions. For instance, the line: >> >> > >> >> > next_data_frame %>% dplyr::select(-amount, >> >> > >> >> > generates the message: >> >> > >> >> > standardize_format: no visible binding for global variable >> >> > ‘amount’ >> >> > >> >> > where, of course, "amount" is one of the column headings in >> >> > "next_data_frame". There seems to be no harm done by this, and I >> >> > plan >> >> > to ignore such messages, but if there's some additional wisdom that >> >> > applies here, I'd be happy to receive it. >> >> > >> >> > -- Mike >> >> > >> >> > >> >> > On Sun, Jul 15, 2018 at 12:05 AM, Georgi Boshnakov >> >> > <georgi.boshna...@manchester.ac.uk> wrote: >> >> > >> >> > >> >> > It seems that the R session used by 'check' doesn't look in the >> >> > library >> >> > used >> >> > by your interactive session. This discrepancy may happen since the >> >> > check >> >> > tools do not load the same Renviron files as interactive sessions. >> >> > This >> >> > may >> >> > result in different libraries in interactive and 'check' sessions. >> >> > See >> >> > ?Startup, especially section Note. >> >> > It is difficult to give more specific advice without details of your >> >> > setup. >> >> > >> >> > >> >> > Hope this helps, >> >> > Georgi Boshnakov >> >> > >> >> > >> >> > ________________________________________ >> >> > From: R-package-devel [r-package-devel-boun...@r-project.org] on >> >> > behalf >> >> > of >> >> > Michael Hannon [jmhannon.ucda...@gmail.com] >> >> > Sent: 15 July 2018 02:13 >> >> > To: r-package-devel@r-project.org >> >> > Subject: [R-pkg-devel] Package builds, installs, and runs but does >> >> > not >> >> > pass >> >> > devtools::check() >> >> > >> >> > Greetings. I'm working on a small package, and I'm using the >> >> > devtools >> >> > functions to create, build, etc., the package. >> >> > >> >> > As indicated in the subject line, I get no errors when I do: >> >> > >> >> > build() >> >> > install() >> >> > >> >> > >> >> > When I run a separate R session and load the package, i.e., >> >> > >> >> > library(my_pkg) >> >> > >> >> > >> >> > the package loads without error, and the two exported functions >> >> > appear >> >> > to work as advertised. >> >> > >> >> > OTOH, if I include devtools::check() in the construction of the >> >> > package, I consistently get an error: >> >> > >> >> > * installing *source* package ‘my_pkg’ ... >> >> > ** R >> >> > ** preparing package for lazy loading >> >> > Error in loadNamespace(from, lib.loc = .library) : >> >> > there is no package called ‘dplyr’ >> >> > Error : unable to load R code in package 'my_pkg' >> >> > >> >> > Clearly there *is* a package called "dplyr" on my system (see the >> >> > session info below, for instance). And, as I've mentioned, the code >> >> > *does* run, and I can watch it successfully reading CSV files. >> >> > >> >> > Here's the relevant part of my DESCRIPTION file: >> >> > >> >> > Depends: R (>= 3.4.4) >> >> > Imports: readr, >> >> > dplyr, >> >> > ggplot2, >> >> > purrr, >> >> > magrittr >> >> > >> >> > I suspect the problem may be that I'm misunderstanding something >> >> > about >> >> > the `import::from()` function, which I'm using for the first time to >> >> > load required functions into my code. In each of the three files >> >> > that >> >> > use dplyr I have the line: >> >> > >> >> > import::from(dplyr, mutate, filter, rename, select, setdiff, >> >> > slice, >> >> > "%>%") >> >> > >> >> > I've tried: >> >> > >> >> > (1) putting that line in just one of the files (the lexically >> >> > first >> >> > one) >> >> > (2) including different subsets of dplyr functions, as needed, in >> >> > the various files >> >> > >> >> > Needless to say, I haven't seen any improvement with any of the above >> >> > (or any of the other thrashing I've done). >> >> > >> >> > If you can point me in the right direction, I'd appreciate it. >> >> > Thanks. >> >> > >> >> > -- Mike >> >> > >> >> > >> >> > session_info() >> >> > >> >> > Session info >> >> > ------------------------------------------------------------------ >> >> > setting value >> >> > version R version 3.4.4 (2018-03-15) >> >> > system x86_64, linux-gnu >> >> > ui X11 >> >> > language en_US >> >> > collate en_US.UTF-8 >> >> > tz America/Los_Angeles >> >> > date 2018-07-14 >> >> > >> >> > Packages >> >> > >> >> > ---------------------------------------------------------------------- >> >> > package * version date source >> >> > assertthat 0.2.0 2017-04-11 CRAN (R 3.3.3) >> >> > base * 3.4.4 2018-03-16 local >> >> > bindr 0.1.1 2018-03-13 CRAN (R 3.4.3) >> >> > bindrcpp 0.2.2 2018-03-29 CRAN (R 3.4.4) >> >> > compiler 3.4.4 2018-03-16 local >> >> > crayon 1.3.4 2017-09-16 CRAN (R 3.4.1) >> >> > datasets * 3.4.4 2018-03-16 local >> >> > devtools * 1.13.6 2018-06-27 CRAN (R 3.4.4) >> >> > digest 0.6.15 2018-01-28 CRAN (R 3.4.3) >> >> > dplyr * 0.7.6 2018-06-29 CRAN (R 3.4.4) >> >> > glue 1.2.0 2017-10-29 CRAN (R 3.4.2) >> >> > graphics * 3.4.4 2018-03-16 local >> >> > grDevices * 3.4.4 2018-03-16 local >> >> > magrittr 1.5 2014-11-22 CRAN (R 3.2.2) >> >> > memoise 1.1.0 2017-04-21 CRAN (R 3.3.3) >> >> > methods * 3.4.4 2018-03-16 local >> >> > pillar 1.3.0 2018-07-14 CRAN (R 3.4.4) >> >> > pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.0) >> >> > purrr 0.2.5 2018-05-29 CRAN (R 3.4.4) >> >> > R6 2.2.2 2017-06-17 CRAN (R 3.4.0) >> >> > Rcpp 0.12.17 2018-05-18 CRAN (R 3.4.4) >> >> > rlang 0.2.1 2018-05-30 CRAN (R 3.4.4) >> >> > stats * 3.4.4 2018-03-16 local >> >> > tibble 1.4.2 2018-01-22 CRAN (R 3.4.3) >> >> > tidyselect 0.2.4 2018-02-26 CRAN (R 3.4.3) >> >> > utils * 3.4.4 2018-03-16 local >> >> > withr 2.1.2 2018-03-15 CRAN (R 3.4.3) >> >> > >> >> > >> >> > >> >> > ______________________________________________ >> >> > R-package-devel@r-project.org mailing list >> >> > https://stat.ethz.ch/mailman/listinfo/r-package-devel >> >> > >> >> > >> >> > ______________________________________________ >> >> > R-package-devel@r-project.org mailing list >> >> > https://stat.ethz.ch/mailman/listinfo/r-package-devel >> >> > >> >> > >> >> >> >> ______________________________________________ >> >> R-package-devel@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-package-devel ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel