Hi! I got surprising results when comparing numpy and pyarrow performance.
val = np.uint8(115) numpy has similar speed if using 115 and np.uint8(115): %timeit np.count_nonzero(data_np == val) 591 µs ± 3.56 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) %timeit np.count_nonzero(data_np == 115) 598 µs ± 3.73 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) strangely it is fastest for b's" %timeit np.count_nonzero(data_np == b"s") 403 µs ± 3.15 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) pc.equal is 2.5 slower for np.uint8(115): %timeit pc.equal(data_pa, val).sum().as_py() 1.64 ms ± 8.23 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) but much, much slower for 115: %timeit pc.equal(data_pa, 115).sum().as_py() 15.6 ms ± 21.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) And fails for b"s": %timeit pc.equal(data_pa, b"s").sum().as_py() ArrowNotImplementedError: Function 'equal' has no kernel matching input types (uint8, binary) I wrote it down in https://github.com/apache/arrow/issues/38640 Any chance to get performance closer to numpy? BR, Jacek