Just wanted to ping the mailing this again about this PR. Just to recap, some simplification has been done thanks to some suggestions by *@rgommers*, though the question still remains whether or not leaving the *axis* parameter in the Python API for now (given how complicated it is to add in the C API) is acceptable.
I will say that in response to the concern of adding parameters such as "out" and "keepdims" (should they be requested), we could avail ourselves to functions like median <https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L3528> for help as *@juliantaylor* pointed out. The *scipy* library has dealt with this problem as well in its *sparse* modules, so that is also a useful resource. Feedback on this issue would be much appreciated! Thanks! On Sun, May 22, 2016 at 1:36 PM, G Young <gfyoun...@gmail.com> wrote: > After some discussion with *@rgommers*, I have simplified the code as > follows: > > 1) the path to the original count_nonzero in the C API is essentially > unchanged, save some small overhead with Python calling and the > if-statement to check the *axis* parameter > > 2) All of the complicated validation of the *axis* parameter and > acrobatics for getting the count is handled *only* after we cannot > fast-track via a numerical, boolean, or string *dtype*. > > The question still remains whether or not leaving the *axis* parameter in > the Python API for now (given how complicated it is to add in the C API) is > acceptable. I will say that in response to the concern of adding > parameters such as "out" and "keepdims" (should they be requested), we > could avail ourselves to functions like median > <https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L3528> > for > help as *@juliantaylor* pointed out. The *scipy* library has dealt with > this problem as well in its *sparse* modules, so that is also a useful > resource. > > On Sun, May 22, 2016 at 1:35 PM, G Young <gfyoun...@gmail.com> wrote: > >> 1) Correction: The PR was not written with small arrays in mind. I ran >> some new timing tests, and it does perform worse on smaller arrays but >> appears to scale better than the current implementation. >> >> 2) Let me put it out there that I am not opposed to moving it to C, but >> right now, there seems to be a large technical brick wall up against such >> an implementation. So suggestions about how to move the code into C would >> be welcome too! >> >> On Sun, May 22, 2016 at 10:32 AM, Ralf Gommers <ralf.gomm...@gmail.com> >> wrote: >> >>> >>> >>> On Sun, May 22, 2016 at 3:05 AM, G Young <gfyoun...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I have had a PR <https://github.com/numpy/numpy/pull/7177> open (first >>>> draft can be found here <https://github.com/numpy/numpy/pull/7138>) for >>>> quite some time now that adds an 'axis' parameter to *count_nonzero*. >>>> While the functionality is fully in-place, very robust, and actually >>>> higher-performing than the original *count_nonzero* function, the >>>> obstacle at this point is the implementation, as most of the functionality >>>> is now surfaced at the Python level instead of at the C level. >>>> >>>> I have made several attempts to move the code into C to no avail and >>>> have not received much feedback from maintainers unfortunately to move this >>>> forward, so I'm opening this up to the mailing list to see what you guys >>>> think of the changes and whether or not it should be merged in as is or be >>>> tabled until a more C-friendly solution can be found. >>>> >>> >>> The discussion is spread over several PRs/issues, so maybe a summary is >>> useful: >>> >>> - adding an axis parameter was a feature request that was generally >>> approved of [1] >>> - writing the axis selection/validation code in C, like the rest of >>> count_nonzero, was preferred by several core devs >>> - Writing that C code turns out to be tricky. Jaime had a PR for doing >>> this for bincount [2], but closed it with final conclusion "the proper >>> approach seems to me to build some intermediate layer over nditer that >>> abstracts the complexity away". >>> - Julian pointed out that this adds a ufunc-like param, so why not add >>> other params like out/keepdims [3] >>> - Stephan points out that the current PR has quite a few branches, would >>> benefit from reusing a helper function (like _validate_axis, but that may >>> not do exactly the right thing), and that he doesn't want to merge it as is >>> without further input from other devs [4]. >>> >>> Points previously not raised that I can think of: >>> - count_nonzero is also in the C API [5], the axis parameter is now only >>> added to the Python API. >>> - Part of why the code in this PR is complex is to keep performance for >>> small arrays OK, but there's no benchmarks added or result given for the >>> existing benchmark [6]. A simple check with: >>> x = np.arange(100) >>> %timeit np.count_nonzero(x) >>> shows that that gets about 30x slower (330 ns vs 10.5 us on my machine). >>> >>> It looks to me like performance is a concern, and if that can be >>> resolved there's the broader discussion of whether it's a good idea to >>> merge this PR at all. That's a trade-off of adding a useful feature vs. >>> technical debt / maintenance burden plus divergence Python/C API. Also, >>> what do we do when we merge this and then next week someone else sends a PR >>> adding a keepdims or out keyword? For these kinds of additions it would >>> feel better if we were sure that the new version is the final/desired one >>> for the foreseeable future. >>> >>> Ralf >>> >>> >>> [1] https://github.com/numpy/numpy/issues/391 >>> [2] https://github.com/numpy/numpy/pull/4330#issuecomment-77791250 >>> [3] https://github.com/numpy/numpy/pull/7138#issuecomment-177202894 >>> [4] https://github.com/numpy/numpy/pull/7177 >>> [5] >>> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#c.PyArray_CountNonzero >>> [6] >>> https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_ufunc.py#L70 >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion