Hello there, I'm here again.

With the help of the previous emails, I finally found out the real C
implementation of np.sum(x), where x is a Numpy array.

I found that there is a "template" function named @TYPE@_pairwise_sum in
numpy/core/src/umath/loops_utils.h.src which is the underlying C code of
np.sum(). I can understand the code but there is still something that I
didn't figure out.

Here is the code link:
https://github.com/numpy/numpy/blob/main/numpy/core/src/umath/loops_utils.h.src#L80

The control logic is quite simple, if the size of the array which is going
to be summed up is smaller than 8, then we directly add them up; If the
size is greater than 8 but smaller than PW_BLOCKSIZE, then we sum a block
with 8 accumulators which will utilize the speed-up of AVX. Then if the
size is greater than PW_BLOCKSIZE, we divide the array into two halves and
recursively call function @TYPE@_pairwise_sum to calculate the two halves
respectively.

My confusion lies here: why should we divide the array into two halves and
recursively calculate this? Since we won't assign two threads to compute
the result in parallel, we can't get any profits from "divide and conquer".
Actually we waste some time and memory in calling functions and operations
in heap.

There must be something that I didn't take into consideration. It will be
appreciated if you can point it out! Thanks in advance~
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to