On Fri, Jul 31, 2020 at 1:40 PM Peter Steinbach <p.steinb...@hzdr.de> wrote:
> Dear numpy devs and interested readers, > > as a day-to-day user, it occurred to me that having a quick look into the > contents and extents of arrays is well doable with > numpy. numpy offers a rich set of methods for this. However, very often I > oversee myself and others that one just wants to see > if the values of an array have a certain min/max or mean or how wide the > range of values are. > > I hence sat down to write a summary function that returns a string of > hand-packed summary statistics for a quick inspection. I > propose to include it into numpy and would love to have your feedback on > this idea before I submit a PR. Here is the core > functionality: > > Examples > -------- > >>> a = np.random.normal(size=20) > >>> print(summary(a)) > min 25perc mean stdev median > 75perc max > -2.289870 -2.265757 -0.083213 1.115033 -0.162885 > -2.217532 1.639802 > >>> a = np.reshape(a, newshape=(4,5)) > >>> print(summary(a,axis=1)) > min 25perc mean stdev median > 75perc max > 0 -0.976279 -0.974090 0.293003 1.009383 0.466814 > -0.969712 1.519695 > 1 -0.468854 -0.467739 0.184139 0.649378 -0.036762 > -0.465510 1.303144 > 2 -2.289870 -2.276455 -0.324450 1.230031 -0.289008 > -2.249625 1.111107 > 3 -1.782239 -1.777304 -0.485546 1.259598 -1.236190 > -1.767434 1.639802 > > So you see, it is merely a tiny helper function that can aid practitioners > and data scientists to get a quick insight on what an > array contains. > > first off, here is the code: > > https://github.com/psteinb/numpy/blob/summary-function/numpy/lib/utils.py#L1021 > > I put it there as I am not sure at this point, if the community would > appreciate such a function or not. Judging from the tests, > lib/utils.py appears to a be place for undocumented functions. So to > resolve this and prepare a proper PR, please let me know > where this summary function could reside! > This seems to be more the domain of scipy.stats and statsmodels. Statsmodels already does a good job with this; in SciPy there's stats.describe ( https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.describe.html) which is quite similar to what you're proposing. Could you think about whether scipy.stats.describe does what you want, and if there's room to improve it (perhaps add a `__repr__` and/or a `__html_repr__` for pretty-printing)? Cheers, Ralf > Second, please give me your thoughts on the summary function's output? > Should the number of digits be configurable? Should the > columns be configurable? Is is ok to honor the axis parameter which is > found in so many numpy functions? > > Last but not least, let me stress that this is my first time contribution > to numpy. I love the library and would like to > contribute something back. So bear with me, if my code violates best > practices in your community for now. I'll bite my teeth > into the formalities of a github PR once I get support from the community > and the core devs. > > I think that a summary function would be a valuable addition to numpy! > Best, > Peter > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion