After the discussion in today's Arrow sync call, I do think it would be beneficial to come up with a formal process for deciding when is a "right time" for upgrading Arrow to a newer C++ standard. I suggest we could consider a set of general metrics/criteria that try to summarize the benefits and drawbacks of such change. Some metrics will be measurable but others will be qualitative. For the latter, we can use a consensus-based scale rating (1-5 with a meaning attached to each value). I am curious what approach other major C++ projects have used to resolve decisions on selecting a C++ standard (aside from crI foreseeitically required features)?
The criteria used to evaluate newer C++ standards need to fairly consider people with different roles with regards to the Arrow project, such as developers, contributors, C++ users, other language users (R, Python), and maintainers. Here is a possible (and likely incomplete) set of metrics: Measurable metrics: * code size (source and binary) - measured in bytes * compilation time (consider each major Arrow component) * runtime - what are the performance changes? (consider each major Arrow component) * systems/OS/tools supported and deprecated * ... Qualitative metrics: * code structure/maintainability - how would it improve development? * code readability - ease of understanding details for new/current contributors? * ... I do think this approach will give us a better standpoint for deciding on when to upgrade to a newer C++ standard. Nevertheless, there are complexities for implementing such an approach: * selecting the "correct" metrics * designing the scale rating * How do we get the community to provide their opinion for the qualitative metrics? What is a "good enough" coverage? * How do we summarize the results into a binary decision: upgrade vs not upgrade? * ... In the end, it might not be worthwhile to go through all this work, I am simply expressing an idea. ~Eduardo On Wed, Jun 9, 2021 at 9:40 AM Antoine Pitrou <anto...@python.org> wrote: > On Tue, 8 Jun 2021 17:37:30 -0500 > Jonathan Keane <jke...@gmail.com> wrote: > > I've been digging a bit to try and put numbers on those users the Neal > > mentions. Specifically, we know that requiring C++17 will mean that R > > users on windows using versions of R before 4.0.0 will not be able to > > compile/install arrow. Although R version 3.6 is no longer supported > > by CRAN [1], many people hang on to older versions for an extended > > period of time. > > > > We are still working on getting more solid numbers about how many > > people might still be on these old versions, but here is what I have > > so far: > > > > Using Rstudio's cran mirror logs of package installations [2] (and > > with the help of Arrow datasets to process/filter these files 🎉) for > > the period from 2020-05-18 [3] to today, for the installations that > > have an r version reported approximately 27% of the windows package > > installs are on versions before 4.0.0 (and therefore would be unable > > to install arrow if we require C++17 right now). > > Is this because binary packages are forbidden in R-land? Do Windows > users of R really install Arrow from source? Or is it really > impossible to use a modern compiler when building R packages for R > versions older than 4.0 ? > > Note the requirement we're proposing to bump is for *building* Arrow. > Using binaries should not be affected, especially on Windows (on Linux, > you must be a bit more careful, but normally the CentOS devtoolset > should take care of that). > > Regards > > Antoine. > > >