Dear all, I’m working as a data scientist in a major tech company. I have been using R for almost 20 years now and there’s one issue that’s been bugging me of late. I apologize in advance if this has been discussed before.
R has traditionally been used for running short scripts or data analysis notebooks, but there’s recently been a growing interest in developing full applications in the language. Three examples come to mind: 1) The Shiny web application framework, which facilitates the developent of rich, interactive web applications 2) The httr package, which provides lower-level facilities than Shiny for writing web services 3) Batch jobs run by data scientists according to, say, a cron schedule Compared with other languages, R’s support for such applications is rather poor. The Rscript program is generally used to run an R script or an arbitrary R expression, but I feel it suffers from a few problems: 1) It encourages developers of batch jobs to provide their code in a single R file (bad for code structure and unit-testability) 2) It provides no way to deal with dependencies on other packages 3) It provides no way to "run" an application provided as an R package For example, let’s say I want to run a Shiny application that I provide as an R package (to keep the code modular, to benefit from unit tests, and to declare dependencies properly). I would then need to a) uncompress my R package, b) somehow, ensure my dependencies are installed, and c) call runApp(). This can get tedious, fast. Other languages let the developer package their code in "runnable" artefacts, and let the developer specify the main entry point. The mechanics depend on the language but are remarkably similar, and suggest a way to implement this in R. Through declarations in some file, the developer can often specify dependencies and declare where the program’s "main" function resides. Consider Java: Artefact: .jar file Declarations file: Manifest file Entry point: declared as 'Main-Class' Executed as: java -jar <jarfile> Or Python: Artefact: Python package, typically as .tar.gz source distribution file Declarations file: setup.py (which specifies dependencies) Entry point: special __main__() function Executed as: python -m <package> R has already much of this machinery: Artefact: R package Declarations file: DESCRIPTION Entry point: ? Executed as: ? I feel that R could benefit from letting the developer specify, possibly in DESCRIPTION, how to "run" the package. The package could then be run through, for example, a new R CMD command, for example: R CMD RUN <package> <args> I’m sure there are plenty of wrinkles in this idea that need to be ironed out, but is this something that has ever been considered, or that is on R’s roadmap? Thanks for reading so far, David Lindelöf, Ph.D. +41 (0)79 415 66 41 or skype:david.lindelof http://computersandbuildings.com Follow me on Twitter: http://twitter.com/dlindelof [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel