pitrou commented on issue #45741:
URL: https://github.com/apache/arrow/issues/45741#issuecomment-2713540047
For the record, Pandas is slower, but not astonishingly so either:
* 10000 groups
```pycon
>>> n = 10000
>>> a = pa.table({'group': list(range(n))*2, 'key': ['h']*n+['w']*n,
'value': range(n*2)})
>>> df = a.to_pandas()
>>> %timeit df.groupby('group').sum('value')
906 μs ± 406 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
```
* 100000 groups
```pycon
>>> n = 100000
>>> a = pa.table({'group': list(range(n))*2, 'key': ['h']*n+['w']*n,
'value': range(n*2)})
>>> df = a.to_pandas()
>>> %timeit df.groupby('group').sum('value')
6.76 ms ± 10.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]