Dear numpy devs and interested readers,

as a day-to-day user, it occurred to me that having a quick look into the 
contents and extents of arrays is well doable with
numpy. numpy offers a rich set of methods for this. However, very often I 
oversee myself and others that one just wants to see
if the values of an array have a certain min/max or mean or how wide the range 
of values are.

I hence sat down to write a summary function that returns a string of 
hand-packed summary statistics for a quick inspection. I
propose to include it into numpy and would love to have your feedback on this 
idea before I submit a PR. Here is the core
functionality:

    Examples
    --------
    >>> a = np.random.normal(size=20)
    >>> print(summary(a))
                min     25perc       mean      stdev     median     75perc      
  max
          -2.289870  -2.265757  -0.083213   1.115033  -0.162885  -2.217532   
1.639802
    >>> a = np.reshape(a, newshape=(4,5))
    >>> print(summary(a,axis=1))
                min     25perc       mean      stdev     median     75perc      
  max
       0  -0.976279  -0.974090   0.293003   1.009383   0.466814  -0.969712   
1.519695
       1  -0.468854  -0.467739   0.184139   0.649378  -0.036762  -0.465510   
1.303144
       2  -2.289870  -2.276455  -0.324450   1.230031  -0.289008  -2.249625   
1.111107
       3  -1.782239  -1.777304  -0.485546   1.259598  -1.236190  -1.767434   
1.639802

So you see, it is merely a tiny helper function that can aid practitioners and 
data scientists to get a quick insight on what an
array contains.

first off, here is the code:
https://github.com/psteinb/numpy/blob/summary-function/numpy/lib/utils.py#L1021

I put it there as I am not sure at this point, if the community would 
appreciate such a function or not. Judging from the tests,
lib/utils.py appears to a be place for undocumented functions. So to resolve 
this and prepare a proper PR, please let me know
where this summary function could reside!

Second, please give me your thoughts on the summary function's output? Should 
the number of digits be configurable? Should the
columns be configurable? Is is ok to honor the axis parameter which is found in 
so many numpy functions?

Last but not least, let me stress that this is my first time contribution to 
numpy. I love the library and would like to
contribute something back. So bear with me, if my code violates best practices 
in your community for now. I'll bite my teeth
into the formalities of a github PR once I get support from the community and 
the core devs.

I think that a summary function would be a valuable addition to numpy!
Best,
Peter



Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to