That's food for thought. I have to admit that I have forgotten almost
everything about linear algebra that I was ever taught -- and I was never
taught numerical accuracy concerns in this context, since we were allowed
to use only pencil and paper (in high school as well as college), so the
problems were always constructed to ensure that correct answers contained
nothing more complicated than 0, 0.5, 1 or sqrt(2). At this point I have a
hard time reproducing multiplication for two 2x2 matrices (the only thing I
remember is that it was the first example of something where AxB != BxA).

What gives me hope though is that Steven has been thinking about this
somewhat seriously already, and given that he successfully chose what to
include or exclude for the statistics module, I trust him to know how much
to put into a Matrix class as well. Certainly I trust him to come up with a
reasonable strawman whose tires we can all kick.

My own strawman would be to limit a Matrix to 2-dimensionality -- I believe
that even my college linear algebra introduction (for math majors!) didn't
touch upon higher dimensionality, and I doubt that what I learned in high
school about the topic went beyond 3x3 (it might not even have treated
non-square matrices).

In terms of numerical care (that topic to which I never warmed up), which
operations from the OP's list need more than statistics._sum() when limited
to NxM matrices for single-digit N and M? (He named "matrix multiplication,
transposition, addition, linear problem solving, determinant.")

On Thu, Aug 13, 2020 at 9:04 PM Stephen J. Turnbull <
turnbull.stephen...@u.tsukuba.ac.jp> wrote:

> Guido van Rossum writes:
>
>  > I was going to say that such a matrix module would be better of in
>  > PyPI, but then I recalled how the statistics module got created,
>  > and I think that the same reasoning from PEP 450 applies here too
>  > (https://www.python.org/dev/peps/pep-0450/#rationale).
>  >
>  > So I'd say go for it!
>
> I disagree that that rationale applies.  Let's consider where the
> statistics module stopped, and why I think that's the right place to
> stop.  Simple statistics on *single* variables, as proposed by PEP
> 450, are useful in many contexts to summarize data sets.  You see them
> frequently in newpaper and Wikipedia articles, serious blog posts, and
> even on Twitter.
>
> PEP 450 mentions, but does not propose to provide, linear regression
> (presumably ordinary least squares -- as with median and mode, there
> are many ways to compute a regression line).  The PEP mentions
> covariance and correlation coefficients once each, and remarks that
> the API design is unclear.  I think that omission was the better part
> of valor.  Even on Twitter, it's hard to abuse the combination of mean
> and standard deviation (without outright lies about the values, of
> course).  But most uses of correlation and regression are more or less
> abusive.  That's true even in more serious venues (Murray &
> Herrnstein's "The Bell Curve" comes immediately to mind).  Almost all
> uses of multiple regression outside of highly technical academic
> publications are abusive.
>
> I don't mean to say "keep power tools out of the reach of the
> #ToddlerInChief and his minions".[1]  Rather, I mean to say that most
> serious languages and packages for these applications (such as R, and
> closer to home numpy and pandas) provide substantial documentation
> suggesting *appropriate* use and pointing to the texts on algorithms
> and caveats.  Steven doesn't do that with statistics -- and correctly
> so, he doesn't need to.  None of the calculations he implements are in
> any way controversial as calculations.  To the extent that different
> users might want slightly different definitions of mode or median, the
> ones provided are good enough for stdlib purposes.  Nor are the
> interpretations of the results of the various calculations at all
> controversial.[2]
>
> But even a two-dimensional regression y on x is fraught.  Should we
> include the Anscombe[3] data and require a plotting function so users
> can see what they're getting into?  I think Steven would say that's
> *way* beyond the scope of his package -- and I agree.  Let's not go
> there.  At all.  Let users who need that stuff use packages that
> encourage them and help them do it right.
>
> I don't find the "teaching high school linear/matrix algebra" use case
> persuasive.  I taught "MBA Statistics for Liberal Arts Majors" for a
> decade.  Writing simple matrix classes was an assignment, and then
> they used their own classes to implement covariance, correlation, and
> OLS.  I don't think having a "canned" matrix class would have been of
> benefit to them -- a substantial fraction (10% < x < 50%) did get some
> idea of what was going on "inside" those calculations by programming
> them themselves plus a bit of printf debugging, which neither the
> linear algebra equations nor the sum operator I wrote on the
> whiteboard did.  I will say I wish I had Steven's implementation of
> sum() at hand back then to show them to give them some idea of the
> care that numerical accuracy demands.
>
> I cannot speak to engineering uses of matrix computations.  If someone
> produces use cases that fit into the list of operations proposed
> (addition, negation, multiplication, inverse, transposition, scalar
> multiplication, solving linear equation systems, and determinants, I
> will concede it's useful and fall back to +/- 0.
>
> However, I think the comparison to multivariate statistics is
> enlightening.  You see many two-dimensional tables in public
> discussions (even on Twitter!)  but they are not treated as matrices.
> Now, it's true that *every* basic matrix calculation (except
> multiplication by a scalar) requires the same kind of care that
> statistics.sum exerts, but having provided that, what have you
> bought?  Not a lot, as far as I can see -- applications of matrix
> algebra are extremely diverse, and many require just as much attention
> to detail as the basic operations do.
>
> In sum, I suspect that simply providing numerically stable algorithms
> for those computations isn't enough for useful engineering work -- as
> with multivariate statistics, you're not even halfway to useful and
> accurate computations, and the diversity is explosive.  How to choose?
>
>
> Footnotes:
> [1]  Any fans of cubic regression models for epidemiology?  No?  OK, then.
>
> [2]  They can be abused.  I did so myself just this morning, to tease
> a statistically literate friend.  But it takes a bit of effort.
>
> [3]
> https://stat.ethz.ch/R-manual/R-patched/library/datasets/html/anscombe.html
>
>

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7YQMHLU2ERPPGOMPE576KT7SO7M72TUN/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to