[ https://issues.apache.org/jira/browse/ARROW-11758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292078#comment-17292078 ]
Yibo Cai commented on ARROW-11758: ---------------------------------- Implemented recursive and non-recursive pairwise summation, also naive summation. Naive summation suffers from severe round-off errors, and worse performance. *Non-recursive pairwise summation* is of best performance, it above 1x faster than recursive one. We should adopt it. Details at https://github.com/cyb70289/sum/ > [C++][Compute] Summation kernel round-off error > ----------------------------------------------- > > Key: ARROW-11758 > URL: https://issues.apache.org/jira/browse/ARROW-11758 > Project: Apache Arrow > Issue Type: Bug > Reporter: Yibo Cai > Assignee: Yibo Cai > Priority: Major > > From below test, summation kernel is of lower precision than numpy.sum. > Numpy implements pairwise summation [1] with O(logn) round-off error, better > than O(n) error from naive summation. > *sum.py* > {code:python} > import numpy as np > import pyarrow.compute as pc > t = np.arange(321000, dtype='float64') > t2 = t - np.mean(t) > t2 *= t2 > print('numpy sum:', np.sum(t2)) > print('arrow sum:', pc.sum(t2)) > {code} > *test result* > {noformat} > # Verified with wolfram alpha (arbitrary precision), Numpy's result is > correct. > $ ARROW_USER_SIMD_LEVEL=SSE4_2 python sum.py > numpy sum: 2756346749973250.0 > arrow sum: 2756346749973248.0 > $ ARROW_USER_SIMD_LEVEL=AVX2 python sum.py > numpy sum: 2756346749973250.0 > arrow sum: 2756346749973249.0 > {noformat} > [1] https://en.wikipedia.org/wiki/Pairwise_summation -- This message was sent by Atlassian Jira (v8.3.4#803005)