Just wanted to ping the mailing this again about this PR.  Just to recap,
some simplification has been done thanks to some suggestions by *@rgommers*,
though the question still remains whether or not leaving the *axis* parameter
in the Python API for now (given how complicated it is to add in the C API)
is acceptable.

I will say that in response to the concern of adding parameters such as
"out" and "keepdims" (should they be requested), we could avail ourselves
to functions like median
<https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L3528>
for
help as *@juliantaylor* pointed out.  The *scipy* library has dealt with
this problem as well in its *sparse* modules, so that is also a useful
resource.

Feedback on this issue would be much appreciated!  Thanks!

On Sun, May 22, 2016 at 1:36 PM, G Young <gfyoun...@gmail.com> wrote:

> After some discussion with *@rgommers*, I have simplified the code as
> follows:
>
> 1) the path to the original count_nonzero in the C API is essentially
> unchanged, save some small overhead with Python calling and the
> if-statement to check the *axis* parameter
>
> 2) All of the complicated validation of the *axis* parameter and
> acrobatics for getting the count is handled *only* after we cannot
> fast-track via a numerical, boolean, or string *dtype*.
>
> The question still remains whether or not leaving the *axis* parameter in
> the Python API for now (given how complicated it is to add in the C API) is
> acceptable.  I will say that in response to the concern of adding
> parameters such as "out" and "keepdims" (should they be requested), we
> could avail ourselves to functions like median
> <https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L3528> 
> for
> help as *@juliantaylor* pointed out.  The *scipy* library has dealt with
> this problem as well in its *sparse* modules, so that is also a useful
> resource.
>
> On Sun, May 22, 2016 at 1:35 PM, G Young <gfyoun...@gmail.com> wrote:
>
>> 1) Correction: The PR was not written with small arrays in mind.  I ran
>> some new timing tests, and it does perform worse on smaller arrays but
>> appears to scale better than the current implementation.
>>
>> 2) Let me put it out there that I am not opposed to moving it to C, but
>> right now, there seems to be a large technical brick wall up against such
>> an implementation.  So suggestions about how to move the code into C would
>> be welcome too!
>>
>> On Sun, May 22, 2016 at 10:32 AM, Ralf Gommers <ralf.gomm...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Sun, May 22, 2016 at 3:05 AM, G Young <gfyoun...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have had a PR <https://github.com/numpy/numpy/pull/7177> open (first
>>>> draft can be found here <https://github.com/numpy/numpy/pull/7138>) for
>>>> quite some time now that adds an 'axis' parameter to *count_nonzero*.
>>>> While the functionality is fully in-place, very robust, and actually
>>>> higher-performing than the original *count_nonzero* function, the
>>>> obstacle at this point is the implementation, as most of the functionality
>>>> is now surfaced at the Python level instead of at the C level.
>>>>
>>>> I have made several attempts to move the code into C to no avail and
>>>> have not received much feedback from maintainers unfortunately to move this
>>>> forward, so I'm opening this up to the mailing list to see what you guys
>>>> think of the changes and whether or not it should be merged in as is or be
>>>> tabled until a more C-friendly solution can be found.
>>>>
>>>
>>> The discussion is spread over several PRs/issues, so maybe a summary is
>>> useful:
>>>
>>> - adding an axis parameter was a feature request that was generally
>>> approved of [1]
>>> - writing the axis selection/validation code in C, like the rest of
>>> count_nonzero, was preferred by several core devs
>>> - Writing that C code turns out to be tricky. Jaime had a PR for doing
>>> this for bincount [2], but closed it with final conclusion "the proper
>>> approach seems to me to build some intermediate layer over nditer that
>>> abstracts the complexity away".
>>> - Julian pointed out that this adds a ufunc-like param, so why not add
>>> other params like out/keepdims [3]
>>> - Stephan points out that the current PR has quite a few branches, would
>>> benefit from reusing a helper function (like _validate_axis, but that may
>>> not do exactly the right thing), and that he doesn't want to merge it as is
>>> without further input from other devs [4].
>>>
>>> Points previously not raised that I can think of:
>>> - count_nonzero is also in the C API [5], the axis parameter is now only
>>> added to the Python API.
>>> - Part of why the code in this PR is complex is to keep performance for
>>> small arrays OK, but there's no benchmarks added or result given for the
>>> existing benchmark [6]. A simple check with:
>>>   x = np.arange(100)
>>>   %timeit np.count_nonzero(x)
>>> shows that that gets about 30x slower (330 ns vs 10.5 us on my machine).
>>>
>>> It looks to me like performance is a concern, and if that can be
>>> resolved there's the broader discussion of whether it's a good idea to
>>> merge this PR at all. That's a trade-off of adding a useful feature vs.
>>> technical debt / maintenance burden plus divergence Python/C API. Also,
>>> what do we do when we merge this and then next week someone else sends a PR
>>> adding a keepdims or out keyword? For these kinds of additions it would
>>> feel better if we were sure that the new version is the final/desired one
>>> for the foreseeable future.
>>>
>>> Ralf
>>>
>>>
>>> [1] https://github.com/numpy/numpy/issues/391
>>> [2] https://github.com/numpy/numpy/pull/4330#issuecomment-77791250
>>> [3] https://github.com/numpy/numpy/pull/7138#issuecomment-177202894
>>> [4] https://github.com/numpy/numpy/pull/7177
>>> [5]
>>> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#c.PyArray_CountNonzero
>>> [6]
>>> https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_ufunc.py#L70
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to