After the discussion in today's Arrow sync call, I do think it would be
beneficial to come up with a formal process for deciding when is a "right
time" for upgrading Arrow to a newer C++ standard. I suggest we could
consider a set of general metrics/criteria that try to summarize the
benefits and drawbacks of such change. Some metrics will be measurable but
others will be qualitative. For the latter, we can use a consensus-based
scale rating (1-5 with a meaning attached to each value). I am curious what
approach other major C++ projects have used to resolve decisions on
selecting a C++ standard (aside from crI foreseeitically required
features)?

The criteria used to evaluate newer C++ standards need to fairly consider
people with different roles with regards to the Arrow project, such as
developers, contributors, C++ users, other language users (R, Python), and
maintainers.
Here is a possible (and likely incomplete) set of metrics:

Measurable metrics:
* code size (source and binary) - measured in bytes
* compilation time (consider each major Arrow component)
* runtime - what are the performance changes? (consider each major Arrow
component)
* systems/OS/tools supported and deprecated
* ...

Qualitative metrics:
* code structure/maintainability - how would it improve development?
* code readability - ease of understanding details for new/current
contributors?
* ...

I do think this approach will give us a better standpoint for deciding on
when to upgrade to a newer C++ standard.
Nevertheless, there are complexities for implementing such an approach:
* selecting the "correct" metrics
* designing the scale rating
* How do we get the community to provide their opinion for the qualitative
metrics? What is a "good enough" coverage?
* How do we summarize the results into a binary decision: upgrade vs not
upgrade?
* ...

In the end, it might not be worthwhile to go through all this work, I am
simply expressing an idea.

~Eduardo


On Wed, Jun 9, 2021 at 9:40 AM Antoine Pitrou <anto...@python.org> wrote:

> On Tue, 8 Jun 2021 17:37:30 -0500
> Jonathan Keane <jke...@gmail.com> wrote:
> > I've been digging a bit to try and put numbers on those users the Neal
> > mentions. Specifically, we know that requiring C++17 will mean that R
> > users on windows using versions of R before 4.0.0 will not be able to
> > compile/install arrow. Although R version 3.6 is no longer supported
> > by CRAN [1], many people hang on to older versions for an extended
> > period of time.
> >
> > We are still working on getting more solid numbers about how many
> > people might still be on these old versions, but here is what I have
> > so far:
> >
> > Using Rstudio's cran mirror logs of package installations [2] (and
> > with the help of Arrow datasets to process/filter these files 🎉) for
> > the period from 2020-05-18 [3] to today, for the installations that
> > have an r version reported approximately 27% of the windows package
> > installs are on versions before 4.0.0 (and therefore would be unable
> > to install arrow if we require C++17 right now).
>
> Is this because binary packages are forbidden in R-land?  Do Windows
> users of R really install Arrow from source?  Or is it really
> impossible to use a modern compiler when building R packages for R
> versions older than 4.0 ?
>
> Note the requirement we're proposing to bump is for *building* Arrow.
> Using binaries should not be affected, especially on Windows (on Linux,
> you must be a bit more careful, but normally the CentOS devtoolset
> should take care of that).
>
> Regards
>
> Antoine.
>
>
>

Reply via email to