dangotbanned commented on issue #47288: URL: https://github.com/apache/arrow/issues/47288#issuecomment-3597653670
We'd have a use for this in https://github.com/narwhals-dev/narwhals [`ArrowSeries.sample`](https://github.com/narwhals-dev/narwhals/blob/0309e3f6a927a764efe89faee6b1e6722b1021a0/narwhals/_arrow/series.py#L644-L661) currently depends on `numpy` to provide the same functionality found in [`polars.Series.sample`](https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.sample.html). [`pandas.Series.sample`](https://pandas.pydata.org/docs/reference/api/pandas.Series.sample.html) has a pretty similar api too ### Example ```py from __future__ import annotations from typing import Any import pyarrow as pa def sample( arr: pa.ChunkedArray[Any], n: int | None, *, fraction: float | None, with_replacement: bool, seed: int | None, ) -> pa.ChunkedArray[Any]: import numpy as np num_rows = len(arr) if n is None and fraction is not None: n = int(num_rows * fraction) rng = np.random.default_rng(seed=seed) idx = np.arange(num_rows) mask = rng.choice(idx, size=n, replace=with_replacement) return arr.take(mask) ca = pa.chunked_array([[2, 4, 8, 0]]) result = sample(ca, n=10, with_replacement=True, fraction=None, seed=None) print(result.to_pylist()) ``` ``` [8, 4, 4, 8, 0, 0, 8, 4, 4, 8] ``` Tbh we should probably be using [`pyarrow.arange`](https://arrow.apache.org/docs/dev/python/generated/pyarrow.arange.html) now anyway 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
